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^  Abstract 

A  non-real  time  10  pole  recursive  autocorrelation  linear 
predictive  coding  vocoder  was  created  for  use  in  studying 
effects  of  recursive  autocorrelation  on  speech.  The  vocoder 
is  composed  of  two  interchangable  pitch  detectors,  a  speech 
analyzer,  and  speech  synthesizer.  The  time  between  updating 
filter  coefficients  is  allowed  to  vary  from  .125  msec  to  20 
msec.  The  best  quality  was  found  using  .125  msec  between 
each  update.  The  greatest  change  in  quality  was  noted  when 
changing  from  20  msec/update  to  10  msec/update. 

Pitch  period  plots  for  the  center  clipping 


autocorrelation  pitch  detector  (AUTOC)  and  simplified 
inverse  filtering  technique  (SIFT)  are  provided.  Plots  of 
speech  into  and  out  of  the  vocoder  are  given.  Formant 
versus  time  -3-tT  plots  are  shown. 

Effects  of  noise  on  pitch  detection  and  formants  are 
shown.  Noise  effects  the  voiced/ unvoiced  decision  process 
causing  voiced  speech  to  be  re-constructed  as  unvoiced. 


vm 


I . INTRODUCTION 

Linear  predictive  coding  (LPC)  is  one  of  the  most 
powerful  speech  analysis  methods.  The  importance  of  this 
method  comes  from  its  ability  to  provide  extremely  accurate 
estimates  of  the  speech  parameters  and  its  speed  of 
computation.  Since  the  United  States  Air  Force  officially 
adopted  LPC10  it  is  useful  to  study  ways  to  improve  it. 


Justification 

The  need  for  further  research  on  the  topic  of  LPC  stems 
from  the  three  major  areas  within  the  general  area  of 
man-machine  communication  by  voice.  The  major  areas  are: 

1.  Voice  response  systems? 

2.  Speaker  recognition  systems? 

3.  Speech  recognition  systems. 

Each  of  these  three  areas  have  applications  useful  to 
the  military.  The  voice  response  systems  are  designed  to 
respond  to  a  request  for  information  such  as  a  flight 
information  system.  The  speaker  recognition  system  could  be 
used  in  speaker  verification.  Finally,  speech  recognition 
systems  could  be  used  on  airplanes  to  update  computer 
terminal  displays.  Table  1-1  contains  a  list  of  several 
miltary  tasks  that  would  be  useful  to  be  automated  in  which 
LPC  techniques  would  likely  be  part  of  the  system. 
Therefore,  any  improvements  in  LPC  systems  bring  us  closer 
to  realizing  these  applications. 
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Table  1-1 


Military  Tasks  for  Possible  Automation 


1)  Security 

1.1  Speaker  Verification  (Authentication) 

1.2  Speaker  Identification  (Recognition) 

1.3  Determining  emotional  state  of  speaker  (e.g. , 
stress  effects) 

1.4  Recognition  of  spoken  codes 

1.5  Secure  access  voice  identification,  whether  or  not 
in  combination  with  fingerprints,  identity  card, 
ect. 

1.6  Surveillance  of  communication  channels 

2)  Command  and  Control 

2.1  System  control  (ships,  aircraft,  fire  control, 
situation  displays,  etc.) 

2.2  Voice-operation  computer  input/output  (each  tele¬ 
phone  a  terminal) 

2.3  Data  handling  and  record  control 

2.4  Material  handling  (mail,  baggage,  publications, 
industrial  applications) 

2.5  Remote  control  (dangerous  material) 

2.6  Administrative  record  control 

3)  Data  Transmission  and  Communication 

3.1  Speech  synthesis 

3.2  Vocoder  systems 

3.3  Bandwidth  reduction  of,  more  general,  bit-rate 
reduction 

3.4  Ciphering/coding/scrambling 

4)  Processing  Distorted  Speech 

4.1  Diver  speech 

4.2  Astronaut  Communication 

4.3  Underwater  telephone 

4.4  Oxygen  mask  speech 

4.5  High  "G"  force  speech 


Background 


Linear  predictive  coding  (LPC)  systems  have  been  very 
successful  in  speech  analysis  and  synthesis  systems.  They 
provide  a  robust,  reliable  and  accurate  method  for 
estimating  the  parameters  that  characterize  the  linear 
time-varying  speech  system.  Traditionally  there  are  three 
commonly  used  methods  for  linear  predictive  analysis.  These 
are  the  autocorrelation  formulation,  covariance  method,  and 
the  lattice  method.  This  thesif  is  concerned  with  the 
autocorrelation  method. 

The  autocorrelation  method  requ  t  .  windowing  operations 
and  buffering  operations  in  addition  to  extensive 
computations  (multiplies  and  adds) .  However,  Barnwell  (Ref 
2)  originated  a  recursive  technique  that  uses  an  infinite 
length  window  which  is  also  the  impulse  response  of  a 
recursive  digital  filter.  Thus  the  autocorrelation  can  be 
done  in  a  recursive  manner  resulting  in  computational 
advantages,  great  reductions  in  buffer  storage  requirements, 
and  simpler  control  logic  requirements.  Since  this  idea  is 
new  many  details  of  the  recursive  system  contain  unanswered 
questions,  some  of  which  will  be  the  topic  of  this  thesis. 

Problem 

The  objective  of  this  thesis  is  to  create  a  recursive 
linear  predictive  coding  (RLPC)  vocoder  system  on  the  Data 
General  Nova-Eclipse  system.  After  implementation,  some  of 
the  benefits  of  the  recursive  nature  will  be  examined. 
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The  vocoder  is  designed  to  be  a  tool  for  analyzing  the 
effects  of  using  recursive  linear  predictive  coding  with 
speech.  The  system  and  utilities  will  allow  the  user  to 
create  artificial  vowels,  plot  synthesized  speech,  and  make 
log  magnitude  plots  for  observing  formant  changes.  The 
vocoder  will  not  be  a  real  time  system. 

Approach 

The  RLPC  system  will  be  created  as  a  three  part  system 
allowing  versatility  for  experimentation.  The  first  part 
will  be  a  pitch  detector,  the  second  will  be  the  speech 
analyzer  and  the  third  will  be  the  speech  synthesizer. 
There  are  two  particular  advantages  to  this  approach. 
First,  intermediate  data  (for  example,  the  predictor 
coeficients  as  a  function  of  time  out  of  the  analyzer)  are 
readily  available  for  data  analysis  without  having  to  run 
the  complete  system  thus  saving  time.  Second,  a  three  part 
system  allows  the  user  to  replace  any  part  of  the  system 
with  a  different  functional  unit  (for  example,  the  Autoc 
pitch  detector  could  be  replaced  with  a  Sift  pitch 
detector) . 


II .  Linear  Predictive  Coding 

Linear  predictive  coding  (LPC)  is  a  commonly  used 
technique  for  estimating  the  basic  speech  parameters  (for 
example,  pitch,  formants,  spectra,  and  vocal  tract  area 
functions) .  This  method  is  important  because  it  provides 
extremely  accurate  estimates  of  the  speech  parameters  and  is 
relatively  fast. 

Basically  LPC  is  a  technique  that  uses  a  linear 
combination  of  past  speech  samples  to  approximate  a  speech 
sample.  A  unique  set  of  predictor  coefficients  are 
determined  by  minimizing  the  sum  of  the  squared  differences 
between  the  actual  speech  samples  and  the  linearly  predicted 
ones  (Ref  7) . 

Three  commonly  used  methods  for  formulating  linear 
prediction  analysis  are:  autocorrelation  formulation, 
covariance  method,  and  the  lattice  method.  Only  the  first 
method  will  be  discussed  in  this  thesis.  The 
autocorrelation  formulation  can  be  considered  to  be  made  up 
of  two  subtasks:  the  calculation  of  the  autocorrelation 
function,  and  the  matrix  inversion  of  the  autocorrelation 
matrix.  The  details  of  the  second  task  will  be  discussed  in 
chapter  III.  The  first  task  is  usually  accomplished  by 
evaluating  a  digital  sequence  using  a  short  time 
autocorrelation  function.  However,  in  this  thesis  a 
recursive  autocorrelation  algorithm  will  replace  the 


typically  used  short  time  autocorrelation  function. 

The  rest  of  this  chapter  will  be  composed  of  a  brief 


review  of  basic  linear  predictive  coding  and  a  discussion  of 
using  recursion  in  linear  predictive  coding. 


A  LPC  requires  a  sampled  digital  signal.  In  this  case 
it  will  be  assumed  that  a  continuous  speech  signal  is 
sampled  at  a  frequency  of  1/T  where  T  is  the  sampling 
period.  If  the  speech  signal  is  s(t)  then  the  sampled 
version  will  be  s(nT).  Henceforth,  for  notational 
simplicity,  s(nT)  will  be  abbreviated  by  s(n)  as  is 
generally  accepted. 

A  model  can  be  created  in  which  the  signal  s(n)  is  the 
output  of  a  system  with  an  unknown  input  u(n)  such  that  the 
following  relation  holds: 


s(n)=-^Ks(n-k)+G^bjU(n-l),  bQ  =1 


(2.1) 


where  aK ;  l<k<p,  b,  7l£l<q*  and  G  are  the  parameters  of  the 
system.  The  past  outputs  are  s(n-k)  and  the  past  and 
present  inputs  are  u(n-k).  Equation  (2.1)  describes  the 
current  output  s(n)  as  a  linear  combination  of  past  outputs 
and  past  and  present  inputs.  Thus  the  name  linear 
prediction. 

By  taking  the  Z  transform  of  equation  (2.1)  the  model 
can  be  specified  in  the  frequency  domain  where  H(z)  is  the 
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transfer  function  given  as: 


H(z)=S(z)/U(z)=G [1+b,  z 


1,/u-Ea 


(2.2) 


where 


S  (z)  = 


£r« 


(2.3) 


is  the  Z  transform  of  s(n)  and  U(z)  is  the  Z  transform  of 
u(n)  . 

Acoustic  theory  states  that  nasal  and  fricative  sounds 
require  both  resonances  and  anti-resonances  (poles  and 
zeros) .  However,  reasoning  with  Atal  (Ref  1) ,  the  effect  of 
a  zero  of  a  transfer  function  can  be  achieved  by  including 
more  poles.  This  results  in  an  all  pole  model  given  by: 


H(z)=S(z)/U(z)=G/[l 


(2.4) 


where  all  the  variables  are  the  same  as  in  equation  (2.2). 
This  new  all  pole  system  has  the  advantage  that  the  gain 
parameter,  G,  and  the  filter  coefficients  aK  can  be 
estimated  in  a  straight  forward  and  computationally 
efficient  manner  using  linear  predictive  analysis.  Using  an 
all  pole  model  simplifies  equation  (2.1)  so  that  the  speech 
samples  s(n)  are  related  to  the  input  by  the  following 
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difference  equation: 


(n)  =y~aKs  (n- 


k) +Gu (n) 


(2.5) 


Linear  prediction  of  speech  assumes  that  a  sample  of 
speech,  s(n),  can  be  approximated  by  a  weighted  sum  of  the 
preceding  p  samples  of  speech  given  by: 


r’ 

(n)  ^(n-) 


(2.6) 


where  J  aR  }  is  a  set  of  predictor  coefficients.  The 
difference  between  the  actual  speech  sample  and  linear 
predicted  speech  (equation (2. 6) )  is  the  prediction  error  and 
is  given  by: 


e  (n)  =s  (n)  -s  (n)  =s  (n)  -^<*K  s  (n-k) 


(2.7) 


where  e(n)  in  equation  (2.7)  can  be  thought  of  as  the  output 
of  a  system  whose  transfer  function  is 


A(z) 


(2.8) 


By  comparing  equation  (2.5)  and  equation  (2.7)  it  is  noted 
that  if  the  speech  signal  obeys  the  model  of  equation  (2.5) 
exactly,  and  if  aK=aK  ,  then  e(n)=Gu(n).  Therefore,  the 
prediction  error  filter,  A(z),  will  be  an  inverse  filter  for 


the  system,  H(z),  given  by: 


H ( z ) =G/A  (z) 


(2.9) 


Basically,  linear  predictve  analysis  determines  a  set  of 
predictor  coefficients  {  J  from  the  sampled  speech  signal 
in  such  a  way  as  to  obtain  an  estimate  of  the  spectral 
properties  of  the  speech  signal  using  equation  (2.9)  .  Short 
segments  of  the  speech  signal  are  used  to  estimate  the 
predictor  coefficients  because  of  the  time-varying  nature  of 
the  speech  signal.  Minimum  mean-squared  error  techniques 
are  used  to  determine  the  predictor  coeficients  which  are 
assumed  to  be  the  parameters  of  the  system  function,  H(z), 
used  in  the  model  for  speech  production.  This  is  a  good 
assumption  and  this  approach  leads  to  a  set  of  linear 
equations  that  can  be  efficiently  solved  to  obtain  the 
predictor  parameters. 

The  short-time  average  prediction  error  defined  over  a 
segment  of  speech  is: 


(2.10) 


(2.11) 


where  sn (m)  is  a  segment  of  speech  that  has  been  selected  in 


the  vicinity  of  sample  n  for  example: 


s„ (m) =s (m+n) 


(2.12) 


Note,  the  limits  of  the  summation  will  not  be  specified  now 
but  will  be  later  since  this  is  a  short-time  analysis 
technique.  Additionally  the  subscript  n  will  be  dropped 
because  it  is  not  needed  in  short-time  analysis. 

By  setting  the  3E „ /a«  =0,  for  the  values  of 

that  minimize  E„  can  be  found  from  the  following 
equations : 


^  s  (m-i)  s  (m)  =^o'K  ^  s’ (m- 


i) s (m-k) 


(2.13) 


m  k=l  m 

for  l<i<p  and  where  a„  are  the  values  of  <*„  that  minimize  E  . 
Henceforth  the  carat  will  be  dropped  and  a„  will  denote  the 
values  that  minimize  E  .  Defining  <»jU,k)  as: 


equation 


♦„(i,k) 


(m-i) s (m-k) 


(2.13) 

P 

n 


m 

can  be  rewritten  as: 

i,k) =  «(i,0) 


k=l 


(2.14) 


(2.15) 


for  i=l,2,...,p.  Now  solving  this  set  of  p  equations  and  p 
unknowns  for  the  unknown  predictor  coefficients  {<*„}  that 
minimize  the  average  squared  prediction  error  for  s(m)  we 
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(m)  s (m-k) 


(2.16) 


k=l  m 


or  using  equation  (2.14)  a  simpler  expression  can  be  used: 


En =  %<° 


k  •„(  0 ,  k ) 


(2.17) 


Solving  for  the  optimum  predictor  coefficients  requires 
computing  the  quantities  ®n(i,k)  for  l<i<p  and  0<k<p  then 
solving  equation  (2.15)  for  the  a^s.  The  details  of  solving 
the  equations  will  be  discussed  using  the  autocorrelation 
method. 


Autocorrelation  method 

First  the  limits  on  the  sums  in  equation  (2.10)  and 
(2.11)  must  be  determined.  One  commonly  used  technique  is 
using  a  window  where  s(m)  is  equal  to  0  outside  the  interval 
0<m<N-l(Ref  10).  This  can  be  expressed  as: 


s (m) =s (m+n) w(m) 


(2.18) 


where  w(m)  is  a  finite  length  window  that  is  zero  outside 
the  interval  0<m<N-l.  Using  these  limits  4>(i,k)  can  be 
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expressed  as: 


N-l-(i-k) 


,k) 


s (m) s (m+i-k) 


(2.19) 


for  lli^p  and  0£k£p.  The  short-time  autocorrelation 
function  is  defined  as: 


N-l-k 


(k)  *^Mx(n+m)  w(m)  ]  [x(n+m+k)w(k+m)  ] 


(2.20) 


Using  equation  (2.18)  with  (2.19)  it  is  seen  that 
equation (2. 19)  and  (2.20)  are  the  same,  thus  we  have 


♦  n(i,k)  =Rn(i-k) 


(2.21) 


Using  the  fact  that  the  autocorrelation  function  is  an  even 
function  E  can  be  expressed  as: 


E«» *Rn  (°) 


tr 

-z- 


Rn(^) 


(2.22) 


This  set  of  equations  can  be  expressed  in  matrix  form 
creating  a  p  x  p  autocorrelation  matrix  which  is  a 
Toeplitz  matrix.  A  Toeplitz  matrix  has  the  property  that  it 
is  symetric  and  all  the  elements  along  a  given  diagonal  are 
equal.  One  solution  for  finding  predictor  coefficients  from 
the  Toeplitz  matrix  is  to  use  Durbin's  recursion  algorithm 
(Ref  7).  The  algorithm  will  be  discussed  in  chapter  III  as 
part  of  the  procedure.  This  thesis  does  not  use  the 
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standard  autocorrelation  technique  just  described  to  find 
the  autocorrelation  matrix.  Instead  the  standard  technique 
is  replaced  with  a  recursive  autocorrelation  method. 


RLPC 

The  idea  for  using  a  recursive  technique  to  compute  the 
autocorrelation  matrix  comes  from  Barnwell  (Ref  2).  Using 
the  standard  autocorrelation  technique  requires  windowing 
operations  and  buffering  operations  in  addition  to  extensive 
computations  (multiplies  and  adds) .  However,  the  recursive 
technique  uses  an  infinite  length  window  which  is  also  the 
impulse  response  of  a  recursive  digital  filter.  Thus  the 
autocorrelation  estimation  may  be  made  recursive  resulting 
in  reductions  in  computation  for  some  structures.  Also 
great  reductions  in  the  buffer  storage  requirements  and 
control  logic  requirements  are  realized.  Finally  this 
method  results  in  a  speech  quality  that  is  equivalent  to  the 
traditional  Hamming  window  realization. 

In  the  standard  short-time  autocorrelation  method 
described  in  the  last  section  a  20-30  ms  Hamming  window  is 
typically  used  for  computing  the  autocorrelation  function. 
This  technique  has  two  problems.  First,  for  good  quality 
speech  the  window  areas  must  overlap.  This  causes  many 
speech  samples  to  be  used  in  forming  the  autocorrelation 
functions  for  more  than  one  frame.  Second,  framing  and 
buffering  problems  associated  with  handling  overlapping 
windows  cause  computational  architectures  which  are  complex 
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and  unwieldy. 


If  the  requirement  for  a  finite  window  is  removed  a 
class  of  windows  can  be  used  which,  though  infinite  in 
length,  are  very  small  in  magnitude  outside  of  some  finite 
region  (for  example  a  21ms  region).  Of  particular  interest 
to  this  thesis  is  a  class  of  windows  which  can  be  formed 
from  the  impulse  response  of  a  second-order  digital  filter 
having  two  real  poles.  This  filter  has  a  z  transform  given 
by: 


H ( z )  =1/  (l-«z“‘)  (l-0z-‘) 


(2.23) 


where  a  and  0  are  the  pole  locations. 

Now  define  the  autocorrelation  function  for  a  windowed 
sequence  to  be  written  as: 


R (k,m) 


n) s (n+k)w(m-n)w(m-n-k) 


n=  —  * 

where  R(k,m)  is  the  kth  autocorrelation 
placement  m.  Defining: 


(2.24) 

lag  for  window 


w(n,k)  *w(n)  w(n-k) 


(2.25) 


and 


s (n,k) =s (n) w(n+k) 


(2.26) 
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equation  (2.24)  can  be  rewritten  as: 


;(k,m)  s  (n,k) w(m-n,k) 


(2.27) 


n=— » 


From  this  it  can  be  seen  that  the  kth  autocorrelation  lag  is 
the  convolution  of  the  function  w(n,k)  and  the  sequence 
s(n,k)  .  Additionally  w(n,k)  is  the  product  of  two  window 
functions,  thus  the  Z  transform  of  w(n,k)  is  W„  (z)  ,  the 
convolution  of  the  Z  transforms  of  the  two  window  functions 
w(n)  and  w(n-k).  Using  a  window  of  infinite  length  and  a 
transfer  function  H(z),  the  impulse  response  of  a  digital 
filter,  W_ (z)  is : 


WK  (z)  =  ( 1  / 2 t j  )^H  ( v )  H  ( z / v )  v  K  civ 


(2.28) 


using  the  window  described  in  equation  (2.23),  equation 
(2.27)  becomes: 


W,  (z)  =  ( 1/ 2ir  j 


))^|vK  / 


[  (1-av1)  (1-flv1) 


(1-av/z)  (l-0v/z)  dv  (2.29) 


upon  evaluating  this  equation  WK (z)  becomes: 


WK(z)=[b(0,k)+b(l,k)z  ]/Q 


(2.30) 
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where 


Q=l-a(l,k)z_l-a(2,k)z‘a-a(3,k)z-3  (2.31) 

b(O,k)  =  («K+'-0K+l)/  (<*-0)  (2.32) 

b(l,k)  =  (0K+l  a-**'  0*)  /  (<*-0  )  (2.33) 

4  d 

a (l,k)  *a  +  0  +<*0  (2.34) 

a(2,k)=-(a*0a+a30+03a)  (2.35) 


a ( 3 ,k) =  a303 


(2.36) 


These  equations  reduce  to  a  simpler  form  if  o  is  allowed  to 
equal  0  .  The  basic  recursive  structure  is  shown  in  figure 
II-l. 

There  are  many  points  worth  noting  about  the  system  in 
figure  II-l.  First,  it  is  a  point  by  point  system  which 
operates  identically  on  every  sample.  Therefore  additional 
buffering  is  not  required  which  is  most  useful  in  a  hardware 
system.  Second,  the  parameter  a  completely  controls  the 
window  length  and  the  same  number  of  computations  are 
required  regardless  of  window  length  or  frame  interval. 
Finally,  only  two  multiplies  are  required  in  the 
nonrecursive  portion  of  the  filter  on  every  frame  interval 
and  not  on  every  sample  (Ref  2) . 
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In  conclusion,  RLPC  is  not  different  than  other 
autocorrelation  techniques  because  they  both  obtain  a 
short-time  autocorrelation  analysis  of  speech  by  applying  a 
window  function.  Thus  the  recursive  technique  retains  all 
the  important  virtues  of  the  autocorrelation  technique 
including:  reflection  coefficients  whose  magnitude  is 
always  less  than  one,  a  stable  receiver  filter,  and  a 
Toeplitz  autocorrelation  matrix. 


S(n,0) 


.F. 

L.F. 

1 

2 

.F.  L.F . 
N-l  N 


R(O.n)  R(l,n)  R(2,n)  R(N-l,n)  R(N,n) 

(a)  Recursive  Autocorrelation  Structure 


r>] 


(n,  k) 


(b)  Linear  Filter  (L.F.) 


Figure  II-l  Recursive  Structure 
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Ill .  The  Vocoder  Model  and  Procedure 
This  chapter  describes  how  the  recursive  linear 
predictive  coding  vocoder  used  in  this  thesis  is  designed 
and  implemented.  The  vocoder  system  will  be  discussed  in 
three  major  sections;  the  pitch  detector,  the  speech 
analyzer,  and  the  speech  synthesizer.  For  a  software 
description  see  appendix  A  and  for  program  listings  see 
appendix  B. 


Pitch  Detector 

Four  basic  problems  exist  in  detecting  the  pitch  period 
of  a  speech  signal.  First,  the  glottal  excitation  waveform 
is  not  a  perfect  train  of  periodic  pulses  (for  an  excellent 
introduction  in  the  mechanics  of  speech  production  see  Ref 
10).  Although  finding  the  period  of  a  perfectly  periodic 
waveform  is  straight  forward,  a  speech  waveform  varies  both 
in  period  and  in  the  detailed  waveform  structure  causing 
pitch  period  determination  to  be  difficult.  Secondly,  the 
pitch  period  must  be  extracted  from  an  output  which  is 
caused  by  the  interaction  between  the  vocal  tract  and  the 
glottal  excitation.  In  some  cases  the  formants  of  the  vocal 
tract  alter  the  structure  of  the  glottal  waveform  so  that 
the  actual  pitch  period  is  difficult  to  detect.  The  third 
problem  comes  from  the  inherent  difficulty  in  defining  the 
exact  beginning  and  end  of  each  pitch  period  during  voiced 
speech  segments.  Finally,  the  fourth  difficulty  occurs  in 
distinguishing  between  unvoiced  speech  and  low-level  voiced 
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speech.  Often  transitions  between  unvoiced  speech  segments 
and  low-level  voiced  speech  segments  are  very  subtle,  thus 
making  them  very  hard  to  pinpoint. 

To  find  a  pitch  dectector  that  would  adequately  perform 
given  the  problems  described  above,  several  pitch  detectors 
were  reveiwed  for  use  in  this  LPC  vocoder  (Ref  4)  .  Two 
pitch  detectors  were  chosen  to  be  used.  The  first  pitch 
detector  chosen  was  the  center  clipping  autocorrelation 
pitch  detector  (AUTOC)  and  the  second  is  a  modified  version 
of  the  simplified  inverse  filter  tracking  (SIFT)  pitch 
detector. 

The  center  clipping  autocorrelation  pitch  detector 
(AUTOC)  was  chosen  because  its  performance  characteristics 
were  equal  to  or  better  than  the  other  pitch  detection 
algorithms.  Specifically,  for  the  fine  pitch  error  category 
(fewer  than  10  continuous  errors)  AUTOC  performed  equal  to 
the  cepstrum  (CEP)  method,  the  simplified  inverse  filtering 
technique  (SIFT) ,  and  the  linear  predictive  coding  (LPC) 
using  pattern  recognition  and  spectral  equalization.  In  the 
category  of  gross  pitch  errors  (more  than  10  continuous 
errors)  AUTOC  had  average  performance.  The  second  pitch 
detector  was  chosen  because  the  code  has  been  published  (Ref 
9)  and  only  a  few  changes  were  required  to  have  a 
functioning  alternate  pitch  detector. 


The  first  pitch  detector  to  be  discussed  is  the  AUTOC 
pitch  detector.  Figure  III-l  is  a  block  diagram  of  the 
AUTOC  system.  The  input  is  a  data  file  of  speech  that  has 


been  sampled  at  8000  Hz  with  a  range  in  magnitude  from  -2048 
to  2047  (in  2's  compliment  representation).  The  original 
speech  was  input  in  an  active  laboratory  environment  with  a 
maximum  possible  input  level  of  -5.0  to  +5.0  volts.  The 
input  speech  data  is  passed  through  a  low  pass  filter  with  a 
900Hz  cut  off  frequency. 

The  speech  is  divided  into  10ms  segments  called  sets 
where  three  sets  make  up  one  frame.  Frames  are  30ms  long 
and  each  frame  overlaps  the  previous  frame  by  20ms  (two 
sets)  .  The  average  energy  is  calculated  for  each  frame 
using  the  following  formula: 


a 

[x(m)w(n-m) ] 


(3.1) 


m=— oo 

A  120  point  rectangular  window,  w(n) ,  was  chosen  for 
windowing  the  speech.  This  compromise  in  size  was  chosen  to 
be  responsive  to  rapid  amplitude  changes  (best  with  a  short 
window)  yet  provide  sufficient  averaging  to  produce  a  smooth 
energy  function  (best  with  a  long  window) .  The  energy  is 
summed  at  each  point  in  the  frame  and  if  any  of  these  sums 
exceed  the  voiced/unvoiced  threshold  the  frame  is  considered 
voiced  and  energy  processing  on  the  current  frame  stops. 
Next  the  frame  is  processed  in  a  peak  detector  algorithm. 

The  peak  detector  algorithm  determines  the  largest 
absolute  value  of  speech  in  the  first  and  third  set  of  each 
frame  and  compares  them.  The  smaller  of  the  two  is 
multiplied  by  a  constant  (in  the  range  of  .6  to  .8)  and  the 
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result  is  called  the  clipping  level  Cl.  The  clipping  level 
was  chosen  in  this  manner  to  avoid  over  clipping  when  the 
speech  varies  greatly  in  intensity  from  the  beginning  of  the 
frame  to  the  end.  The  clipping  level  will  be  used  in  the 
center  and  infinite  peak  clipper.  When  a  large  clipping 
level  is  chosen,  fewer  peaks  will  exceed  the  clipping  level, 
thus  fewer  will  appear  in  the  autocorrelation  function.  A 
smaller  clipping  level  allows  more  peaks  to  pass  through 
thus  the  autocorrelation  function  is  more  complex  (more 
peaks) .  So  a  higher  clipping  level  is  prefered  to  minimize 
the  autocorrelation  function's  complexity  which  in  turn 
simplifies  the  process  of  finding  the  fundamental 
autocorrelation  peak.  Therefore,  by  checking  at  the 
beginning  and  the  end  of  the  frame  a  reasonable  high 
clipping  level  can  be  chosen  which  produces  a  smoother 
autocorrelation  function. 

Next  the  speech  is  passed  through  the  clipper  which  uses 
the  previously  determined  clipping  level.  The  clipper  does 
center  and  infinite  peak  clipping.  If  a  speech  value 
exceeds  Cl  or  -Cl  it  is  assigned  a  value  of  1.0  or  -1.0 
respectively  and  anything  in  between  is  assigned  a  value  of 


0.0. 


A  short-time  autocorrelation  operation  is  performed  on 
the  clipped  speech  using: 


R(k) 


IV  4. 

-I- 


+m) x (n+m+k) 
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where  x,  and  xx  have  been  windowed  by  rectangular  windows  80 
and  240  samples  long  respectively.  Also,  n  is  the  frame 
edge,  m  is  the  location  in  the  frame  being  autocorrelated 
and  k  is  the  current  autocorrelation  lag.  From  this  it  can 
be  seen  that  lags  as  large  as  160  samples  can  occur.  With 
speech  being  sampled  at  8000  Hz  pitch  periods  as  large  as 
20ms  (or  equivalently  50Hz)  or  as  small  as  2ms  (or 
equivalently  500Hz)  can  be  found.  This  range  is  sufficient 
to  determine  the  fundamental  excitation  period.  The  pitch 
period  of  a  frame  is  determined  by  finding  the  first 
autocorrelation  value  (beyond  2ms)  that  exceeds  80%  of  R(0)  . 

The  second  pitch  detector  to  be  used  is  the  simplified 
inverse  filtering  technique  (SIFT) .  Since  the  basic  code 
and  detailed  description  of  SIFT  has  been  published  in  Ref 
9,  a  less  detailed  description  than  the  one  for  the  previous 
pitch  detector  will  be  given.  Figure  III-2  is  a  block 
diagram  of  the  SIFT  pitch  detector.  A  block  of  400  speech 
samples  (sampled  at  8KHz)  is  low  pass  filtered  to  a 
bandwidth  of  900Hz  and  then  decimated  by  a  5  to  1  ratio. 
The  autocorrelation  method  of  LPC  analysis  is  used  to  find 
the  coefficients  of  a  4th-order  inverse  filter.  The  1600Hz 
speech  signal  (due  to  decimation)  is  passed  through  the 
inverse  filter  to  produce  a  spectrally  flattened  signal 
which  is  then  autocorrelated.  By  using  parabolic 
interpolation  in  the  region  of  the  peak  of  the 
autocorrelation  function  the  pitch  period  is  found.  The 
voiced/unvoiced  decision  is  made  on  the  basis  of  the 


amplitude  of  the  peak  of  the  autocorrelation  function.  In 
addition  to  this  a  silence  detector  is  used  to  determine  if 
speech  or  silence  is  present.  The  silence  detector  computes 
the  energy  in  a  frame  and  if  it  exceeds  the  silence 
threshold  then  it  is  speech  otherwise  it  is  considered 
silence. 

As  a  final  output,  both  pitch  detectors  provide  the 
following  output  information: 

1)  Whether  the  frame  is  silent  or  not; 

2)  voiced  or  unvoiced; 

3)  if  voiced,  what  the  pitch  period  is. 


Speech  Analysis 

The  speech  analysis  technique  used  in  this  thesis  is  the 
standard  autocorrelation  method  where  the  short-time 
autocorrelation  computation  was  replaced  with  a  recursive 
autocorrelation  computation.  Figure  III-3  is  a  block 
diagram  of  the  analysis  system.  Basically  the  system 
requires  three  operations:  first  the  autocorrelation  is 
performed  on  some  segment  of  speech,  second  the  LPC 
predictor  coefficients  are  found  using  Durbin's  recursion 
(Ref  7),  and  finally  the  gain  is  determined. 

In  addition  to  the  system  described  above  a  module  was 
created  to  preprocess  speech  before  the  autocorrelator. 
This  module  is  a  FIR  pre-emphasis  filter  of  the  following 


y (n) »x (n) 9x (n-1) 


(3.2) 


where  x(n)  is  the  input  to  the  filter  and  y(n)  is  the 
output.  The  pre-emphasis  filter  is  used  to  make  the 
spectrum  as  flat  as  possible  which  has  been  shown  to 
minimize  the  effect  of  roundoff  errors  causing 
ill-conditioned  matrices  (Ref  9) . 

The  autocorrelation  method  used  in  this  thesis  has  been 
discussed  extensively  in  chapter  II  and  the  system  design  is 
shown  in  figure  II-l.  However,  the  choice  of  the  order  (p) 
of  the  predictor  polynomial  has  not  been  discussed.  It  is 
well  known  that  p  must  be  large  enough  to  account  for  both 
the  vocal  tract  and  glottal  pulse  effects.  However,  as  p 
increases  the  computational  burden  increases.  Determining 
what  value  of  p  to  use  is  resolved  by  observing  the 
normalized  rms  prediction  error  versus  the  predictor  order  p 
for  sections  of  voiced  and  unvoiced  speech  (Ref  10) .  For  p 
on  the  order  of  13-14  the  error  essentially  flattens  out 
showing  only  small  decreases  as  p  is  increased  further. 
Also  it  was  found  that  the  choice  of  p  depends  primarily  on 
the  sampling  rate  (independent  of  the  LPC  method  used) .  In 
this  thesis  an  8khz  sampling  rate  is  used  which  requires 
approximately  8  poles  to  represent  the  vocal  tract. 
Additionally  2-4  poles  are  required  to  adequately  represent 
the  source  excitation  spectrum  and  the  radiation  load.  Thus 


p  was  chosen  to  be  of  order  10. 

As  discused  in  chapter  II  for  the  autocorrelation  method 
the  matrix  equation  for  solving  for  the  predictor 
coefficients  is  of  the  form: 

P 

*  R„(i-k)«R„(i)  lSi^p  (3.3) 

k*l 

The  most  efficient  method  known  for  solving  this  set  of 
equations  exploits  the  Toeplitz  nature  of  the  matrix  of 
coefficients.  This  method  is  Durbin's  recursive  procedure 
(Ref  7)  which  is  given  in  equations  3.4  to  3.8: 


*R(0) 


k:  = [R(i 


UK 

a  =k  i 


»-2 


(i-l)  .  .  ,  (i-U 

R(i-j)J/E  l<i<p 


k=l 


m  u-D 

a  *o 


U) 

E 


-(1-k. 


1< j<i-l 


(3.4) 

(3.5) 

(3.6) 

(3.7) 

(3.8) 


Equations  3.5  to  3.9  are  solved  recursively  for 
i=l,2,...,p  and  the  final  solution  is: 


l<j<p  (3.9) 
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The  last  part  of  the  analysis  system  determines  the 
system  gain  for  the  current  frame.  This  is  found  using  the 
following  relation: 


3*  .R(0)-g, 


G  *R(0)- )a„R(k)  (3.10) 

k*l 

where  G  is  the  gain. 

As  a  final  output  the  analysis  system  provides  the 
following  information  for  each  frame: 

1)  A  set  of  predictor  coefficients; 

2)  the  gain. 


Speech  Synthesis 

The  speech  synthesizer  is  composed  of  five  basic  parts 
including  a  pulse  generator,  noise  generator,  switch, 
receiver  filter,  and  a  D/A  converter.  These  elements  are 
shown  in  figure  I I 1-4.  The  receiver  filter  can  be  thought 
of  as  a  vocal  tract  that  is  excited  by  the  pulse  generator 
or  noise  generator  for  voiced  or  unvoiced  sounds 
respectively. 

If  the  current  speech  to  be  produced  is  unvoiced 
(determined  earlier  by  the  pitch  detector)  the  switch  causes 
the  noise  generator  to  be  the  input  to  the  receiver. 
Conversely  for  voiced  speech  the  switch  connects  the  pulse 
generator  to  the  receiver  input  filter. 


For  voiced  speech  the  pulse  generator  can  be  thought  of 


as  a  three  part  system  consisting  of  an  impulse  train 
generator,  a  glottal  pulse  model,  and  an  amplitude  control 
where  the  system's  input  is  a  pitch  period  and  the  output  is 
a  glottal  pulse  shape.  The  system  is  shown  in  figure  III-5. 
Current  pitch  information  will  come  from  the  pitch  detector 
causing  the  glottal  pulse  model  to  be  impulsed  at  the  pitch 
period.  The  output  pulse  will  then  be  multiplied  by  the 
current  gain  value  received  from  the  speech  analysis  system. 

In  experiments,  Rosenberg  (Ref  11) ,  discovered  that  good 
quality  synthetic  speech  can  be  obtained  using  excitation 
functions  which  can  be  specified  by  trigonometric  or 
polynomial  functions  uniformly  throughout  the  vowel  portions 
of  an  utterance.  The  results  of  his  experimentation  showed 
that  the  most  preferred  pulse  shapes  have  only  one  single 
slope  discontinuity  which  is  at  the  closing  end  of  the 
pulse.  One  of  the  waveshapes  rated  highest  in  Rosenberg's 
tests  and  the  waveshape  used  in  this  thesis  is: 

g(n) * (1/2)  [1-cos  Orn/Tp) ]  0<n<Tp  (3.11) 

=cos  (jr(n-Tp) /2Tn)  Tp<n<Tp+  TN 

«0  elsewhere 

where  Tp  and  TN  are  the  portions  of  the  waveform  with  a 
positive  and  negative  slope  respectively.  The  tests  also 
demonstrated  that  there  is  no  particular  best  value  for  Tp 
and  T*  .  However,  combinations  having  very  small  opening  or 
closing  times,  or  opening  times  less  than  or  approximately 


equal  to  closing  time  were  not  ranked  well  in  listening 
tests.  The  values  of  Tp  and  TN  chosen  for  this  thesis  are 
discussed  in  chapterIV. 

In  contrast  to  the  pulse  generator  which  produces  a 
pulse  the  noise  generator  produces  impulses  of  varying 
magnitude.  For  discrete-time  models  (such  as  this) ,  the 
random  number  generator  provides  a  source  of  flat-spectrum 
noise.  The  probability  disrtibution  of  the  noise  was  chosen 
to  be  normal  although  there  seems  (Ref  10)  to  be  no 
distribution  which  is  considered  the  best. 

The  algorithm  for  the  normal  generator  required  a 
uniformly  distributed  random  generator  as  an  input.  A  prime 
modulus  multiplicative  linear  congruential  generator 
(PMMLCG)  was  chosen  as  the  uniform  random  number  generator. 
This  decision  was  made  purely  because  Ref  5:227  had 
available  a  Fortran  listing  of  a  proven  uniform  generator 
for  16  bit  machines  such  as  the  Data  General  Eclipse.  The 
problem  with  many  uniform  random  number  generators  is  that 
they  have  undesirable  statistical  properties  (although  this 
may  not  be  a  problem  in  speech) .  Rather  than  going  into  the 
details  of  PMMLCG  systems  which  is  not  pertinent  to  speech 
production  the  reader  can  see  Ref  5  for  more  information. 

The  algorithm  for  the  normal  generator  was  chosen  as  the 
best  of  the  current  normal  generating  algorithms  available 
using  the  criteria  of  efficiency  (speed  and  storage 
requirements)  and  quality  or  accuracy  of  the  output  (Ref  6)  . 
The  prime  benefit  of  this  algorithm  is  its  speed  with 


average  storage  requirements  compared  to  other  algorithms. 

Refering  back  to  figure  III-4  the  excitation  to  the 
synthesis  filter  varies  with  time  as  would  be  the  case  in  a 
true  glottus.  In  a  similar  manner  the  synthesis  filter 
changes  its  filter  coefficients  with  time  similar  to  the 
vocal  tract  changing  shape.  This  is  only  an  analogy  and  the 
particular  values  of  the  filter  coefficients  are  a  function 
of  what  type  of  digital  filter  is  chosen  and  how  the 
parameters  were  created  (for  example,  autocorrelation, 
covariance  etc.). 

Generally  filter  parameters  are  estimated  at  regular 
intervals  in  regions  of  voiced  speech  and  filter 
coefficients  are  updated  at  the  beginning  of  each  pitch 
period.  For  unvoiced  speech  the  coefficients  are  updated 
once  at  the  beginning  of  each  frame.  This  thesis  requires  a 
slightly  different  approach  since  filter  coefficients  are 
available  for  each  speech  sample.  One  interesting  question 
is:  how  often  is  the  useful  (new)  information  available  to 
update  the  synthesis  filter?  This  imposes  the  requirement 
that  the  synthesis  filter  must  be  able  to  be  updated  at  any 
time.  As  stated  earlier,  typical  systems  usually  update 
their  parameters  each  pitch  period  (called  pitch  synchronous 
synthesis)  which  has  been  found  to  be  more  effective  than 
updating  parameters  once  per  frame  (called  asynchronous 
synthesis) .  Unfortunately,  the  pitch  synchronous  synthesis 
technique  has  the  problem  that  it  requires  filter 
coefficients  that  must  be  found  by  the  interpolation  of  the 
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predictor  coefficients,  which  are  not  always  stable.  This 
problem,  although  common  and  easily  solved,  does  not  exist 
in  this  synthesis  system  because  of  the  recursive  nature  of 
the  analysis  system  makes  filter  coeficient  information 
available  at  any  synthesized  speech  sample. 

The  speech  analysis  system  provides  filter  coefficients 
configured  in  the  direct  form.  Using  some  simple 
transformations  other  filter  forms  could  be  used  for  the 
synthesis  filter.  But  since  the  thrust  of  this  thesis  isn't 
affected  by  the  choice  of  synthesis  filters,  the  direct  form 
filter  was  chosen  because  it  is  easier  to  implement. 

Since  a  pre-emphasis  filter  was  used  in  the  analysis 
portion  of  the  system,  a  de-emphasis  filter  must  be  used  in 
the  synthesis  portion.  The  synthesized  speech  is  passed 
through  a  filter  of  the  following  form: 

y (n) =x(n) +.9y (n-1) 

where  x(n)  is  the  filter  input  and  y(n)  is  the  filter 
output. 

As  a  final  output  of  the  synthesis  system  the  following 
information  is  provided: 

1)  An  integer  file  of  speech. 

In  summary,  the  vocoder  is  composed  of  a  pitch  detector, 
a  speech  analyzer,  and  speech  synthesizer.  The  pitch 
detector  determines  if  a  speech  segment  is  silence,  voiced 


or  unvoiced,  and  if  voiced  a  pitch  period.  Then  the  speech 
analyzer  uses  recursive  autocorrelation  with  Durbin's 
recursion  algorithm  to  compute  LPC  predictor  coefficients. 
Speech  is  then  synthesized  by  the  speech  synthesizer  using 
information  provided  by  the  pitch  detector  and  speech 
analyzer. 


ck  Diagram  of.  the  SIFT  Pitch  Detector 


Figure  III-3  Block  Diagram  of  Speech  Analyzer 


Figure  III-l  Block  Diagram  of  the  AUTGC 
Pitch  Detector 


Figure  III-3  Block  Diagram  of  Speech  Analyzer 


Figure  III-4  Block  Diagram  of  Speech  Synthesizer 
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Figure  III-5  Block  Diagram. of  Pulse  Generator 


IV.  Results  and  Conclusions 

This  chapter  will  be  divided  into  five  sections.  The 
first  of  these  will  discuss  tests  that  were  performed  to 
find  the  best  quality  of  speech  by  varying  parameters  such 
as  silence  thresholds,  glottal  pulse  shape,  etc.  The  second 
section  describes  some  of  the  vocoder  outputs  using  time 
domain  plots  and  3-dimensional  spectral  plots  (log 
magnitude).  The  third  section  will  look  at  the  data  rate  of 
the  vocoder  and  the  effects  of  using  recursion.  The  fourth 
section  looks  at  some  of  the  effects  noise  has  on  the  RLPC 
system.  The  final  section  will  be  the  conclusion  section. 

Tests 

The  goal  of  the  tests  described  in  this  section  was  to 
find  the  combination  of  parameters  that  caused  the  vocoder 
to  produce  the  best  "quality  of  speech.”  Unfortunately  a 
measure  of  the  "quality  of  speech"  is  not  easy.  For  the 
purposes  of  this  thesis  Informal  listening  tests  were  used. 
Each  of  the  parameters  that  were  varied  are  essentially 
independent,  thereby  allowing  optimization  tests  to  be 
performed  on  one  parameter  at  a  time. 

The  tests  were  performed  using  the  same  utterance  as  an 
input  to  the  vocoder  for  each  parameter  change.  The  range 
of  the  parameters  were  changed  in  such  a  way  that  the  whole 
range  of  reasonable  values  were  checked.  For  example,  in 
the  case  of  the  silence  threshold  one  value  was  chosen  large 
enough  to  clip  speech  information  and  the  other  extreme  was 


made  small  enough  to  be  obvious  that  excessive  noise  was 
input.  By  using  this  method  of  choosing  extremes,  optimum 
parameter  values  would  not  be  outside  of  the  range  of 
testing  and  thus  would  not  be  missed.  By  varying  parameters 
between  the  extremes  an  informal  listening  group  was  able  to 
listen  to  the  vocoder  output  and  choose  the  best  parameter. 

Six  speech  files  (S1-S6)  were  available  in  the  Data 
General  Nova-Eclipse  system.  The  files  are  two's  compliment 
integer  files  with  an  absolute  maximum  value  of  32,768.  Out 
of  the  six  possible  files  two  were  chosen,  one  was  spoken  by 
a  man  and  the  other  was  a  woman.  There  was  no  reason  to 
choose  a  particular  file  but  it  seemed  reasonable  to  choose 
one  male  and  one  female.  The  two  utterances  chosen  were  SI 
and  S2  (number  1  and  2,  below,  respectively).  The  utterance 
in  each  file  is  listed  below: 


1)  Sl-"The  pipe  began  to  rust  while  new"-female 
speaker 

2)  S2-"Thieves  who  rob  friends  deserve  jail ’’-male 
speaker 


3) 

S3-"Add 

the  sum 

to 

the 

product 

of 

these 

two "-female 

speaker 

4) 

S4-"0pen 

the  crate 

but 

don't 

break 

the 

glass"-male 

speaker 

5) 

S5-"0ak 

is  strong 

and 

also 

gives 

shade" 

-male 

speaker 

6)  S6-"Cats  and  dogs  each  hate  the  other”-male 
speaker 


Also  three  additional  speech  files  (the  speaker  was 
Major  Kizer)  were  used  and  are  listed  below: 


1)  file  "ONE"-containing"one. . .two. . .three. . .four" 

2)  file  "FIVE"-containing"f ive. . .six. . . 

seven. . .eight" 

3)  file  "ZERO"-containing"zero. . .nine. . .ten" 


The  first  test  to  be  performed  was  the  silence  threshold 
test.  The  goal  of  this  test  was  to  determine  the  silence 
threshold  that  eliminated  unwanted  noise  (during  intervals 
without  speech)  but  did  not  eliminate  any  speech.  The 
silence  detection  is  accomplished  in  the  pitch  detector. 
Two  pitch  detectors  were  used  in  this  thesis  and  because  of 
processing  differences  different  silence  thresholds  are 
needed  for  each  pitch  detector.  Using  the  SIFT  pitch 
detector  the  best  silence  threshold  was  found  to  be  .1 
(which  equates  to  .1%  of  the  rms  energy  in  a  sine  wave  with 
the  maximum  integer  value  of  32,768).  Using  the  AUTOC  pitch 
detector  the  best  silence  threshold  was  found  to  5.0  . 

For  the  AUTOC  pitch  detector  two  additional  tests  had  to 
be  performed.  These  tests  found  the  optimum  clipping  level 
and  autocorrelation  threshold  both  of  which  were  discussed 
in  chapter  III.  These  tests  were  the  only  tests  in  which 
informal  listening  tests  were  not  used  exclusively.  Instead 
the  AUTOC  pitch  detector  output  was  plotted  for  the  speech 
files  S1-S6.  These  plots  were  compared  with  a  pitch  period 
plot  of  the  original  speech  file.  Unfortunately,  the  time 
scale  information  was  not  available  for  the  original  pitch 
period  plot.  However,  general  pitch  trends  could  be 
observed  to  determine  if  the  pitch  detector  was  close.  See 
figure  IV-1  (pitch  plot  of  SI)  for  an  example  of  this.  Of 
course  this  really  is  only  a  qualitative  test  and  an  actual 
listening  test  is  required  to  verify  the  pitch  detectors 
quality.  The  best  choice  found  for  the  clipping  level  was 


0.6  and  0.3  for  the  autocorrelation  threshold.  These 
parameter  choices  were  then  verified  using  informal 
listening  tests. 

The  next  test  compared  the  quality  of  speech  produced 
using  each  pitch  detector.  See  figure  IV-1  for  a  pitch 
period  plot  of  SI  using  each  pitch  detector  and  a  plot  of 
the  original  "handpainted"  pitch  period  plot  (note:  the  hand 
painted  plot  is  not  on  the  same  time  scale  as  the  other 
two)  .  The  best  speech  was  produced  using  the  SIFT  pitch 
detector  in  the  vocoder.  The  synthesized  speech  using  the 
AUTOC  pitch  detector  tended  to  waver  (even  after  trying 
linear  smoothing,  median  smoothing,  and  the  two  combined) . 
Since  the  primary  purpose  of  this  thesis  is  to  create  a 
system  which  allows  the  user  to  experiment  with  recursive 
LPC  the  SIFT  pitch  detector  was  chosen  to  be  used  in  the 
remainder  of  the  tests.  In  choosing  the  SIFT  pitch 
detector,  higher  quality  speech  will  be  produced.  Therefore 
subtle  LPC  effects  will  not  be  masked  by  a  lower  quality 
pitch  detector. 

The  next  test  found  a  ratio  between  unvoiced  and  voiced 
impulse  values.  In  a  qualitative  way  this  allowed  the 
energy  in  voiced  and  unvoiced  speech  to  be  modified.  To 
understand  why  this  ratio  is  necessary  consider  the  digital 
filter  synthesizer.  For  unvoiced  speech,  energy  is  being 
continually  input  to  the  digital  filter  (where  the  actual 
energy  is  determined  by  the  random  process  used) .  However, 
in  the  case  of  voiced  speech  the  digital  filter  is  only 


impulsed  at  the  begining  of  each  pitch  period.  To  account 
for  this  difference  in  energies,  informal  listening  tests 
were  used  to  find  the  best  unvoiced/voiced  multiplier  value. 
Additionally  this  scaling  of  the  random  generator  could  be 
thought  of  as  decreasing  the  variance  of  the  random  process. 
This  number  turned  out  to  be  smaller  than  1.0  as  would  be 
expected  since  unvoiced  speech  has  less  energy  than  voiced 
speech.  The  actual  value  found  was  0.1  .  For  larger  values 
the  synthesized  speech  tended  to  be  scratchy  sounding  and 
smaller  values  produced  speech  without  good  quality 
fricatives . 

The  next  test  used  each  of  the  six  speech  files 
described  earlier  as  an  input  to  the  vocoder.  Basically, 
the  quality  of  the  processed  speech  was  understandable  with 
the  speech  produced  by  males  being  generally  of  higher 
quality  than  those  produced  by  female  voices.  The  final 
observation  is  that  the  reconstructed  speech  did  not  seem  to 
have  the  tonal  quality  (or  timbre)  present  in  the  original 
speech. 

Whether  to  use  a  glottal  impulse  or  a  glottal  pulse 
shape  was  the  subject  of  the  next  test.  As  stated  in 
chapter  III  Rosenberg  (Ref  11)  found  that  various  glottal 
pulse  wave  shapes  provide  good  quality  speech.  The  vocoder 
created  for  this  thesis  allows  either  a  trigonometric  wave 
form  or  an  impulse  to  be  used  for  the  glottal  pulse.  One  of 
the  better  waveforms  used  in  Rosenburg's  tests  occupied 
approximately  forty  percent  of  the  pitch  period.  When  a 


glottal  pulse  of  this  duration  was  used  the  speech  quality 
was  very  poor.  Only  when  the  glottal  pulse  occupied  ten 
percent  or  less  of  a  pitch  period  dia  the  quality  of  the 
speech  become  reasonable.  However,  when  an  impulse  was  used 
the  quality  was  best.  The  impulse  seemed  to  produce 
crisper,  cleaner  speech;  whereas  the  glottal  pulse  shaped 
wave  form  tended  to  be  muffled  sounding.  Although  other 
wave  shapes  could  have  been  chosen  from  Rosenberg's  tests 
(the  trigonometric  was  considered  one  of  the  best  in  the 
tests)  this  author  believes  that  the  shapes  of  the  waveforms 
were  similar  enough  that  another  waveform  choice  would  not 
have  changed  the  results. 

The  final  test  varied  the  recursive  autocorrelation 
window  shape.  See  figure  IV-2  (a)  for  a  plot  of  the  window 
shape  with  alpha=.97  and  figure  IV-2  (b)  for  alpha=.98  . 
Tests  by  Makhoul  and  Cosell  (Ref  8)  of  speech  quality  for 
various  window  shapes  alpha  (see  chapter  II  equation  (2.23) 
to  see  where  alpha  is  used)  was  varied  from  .97467  to  .97979 
(actually  they  used  the  square  of  alpha  but  this  author 
found  the  square  root  for  easier  comparisons) .  However,  in 
Barnwell's  testing  alpha  was  varied  from  .97900  to  .98500  . 
In  this  test  alpha  was  varied  from  .97750  to  .98500  .  The 
quality  of  the  speech  over  the  range  of  the  alphas  did  not 
vary  much.  The  value  .98000  was  the  best  choice  for  alpha 
but  it  wasn't  a  great  deal  better  than  the  extremes  of 


VOCODER  OUTPUT 


This  section  has  two  purposes  where  the  first  is  to 
demonstrate  possible  outputs  available  from  the  vocoder 
system  and  second  show  some  of  the  effects  and  benefits  of 
the  recursive  autocorrelation  technique. 

As  discussed  in  the  last  section  initially  six  speech 
files  were  used  (S1-S6) .  Figure  IV-2.1  to  IV-2.8  is  a  plot 
of  the  original  speech  file  SI  ("The  pipe  began  to  rust 
while  new") .  These  and  all  other  plotted  vocoder  outputs, 
unless  otherwise  specified,  were  created  using  the  SIFT 
pitch  detector  and  the  following  vocoder  parameters: 

1)  window  shape-  .98000 

2)  silence  threshold-  .10000 

3)  unvoiced  to  voiced  gain-  .10000 

4)  frame  seperation-  10 

5)  pre/de-emphasis-  used 

6)  glottal  pulse-  impulse  used 

Before  getting  into  the  details  of  using  the  speech  (SI) 
some  basic  information  on  how  the  plots  of  the  speech  files 
are  structured  will  be  given.  First,  the  figures  were 
ordered  so  that  the  figure  number  represents  one  complete 
speech  file.  The  number  following  the  decimal  point  simply 
identifies  sequential  speech  plots.  In  addition  to  this, 
plots  on  a  given  page  are  also  time  sequential  (for  example, 
the  second  plot  from  the  top  follows  the  first  etc.)  because 
the  speech  is  continuous.  The  vertical  scale  of  each  plot 
is  the  integer  value  of  speech  at  a  given  sample  time  (one 
two  byte  two's  complimentary  value  in  the  speech  file  which 
was  scaled  for  plotting) .  The  horizontal  scale  represents 
time  (in  samples  at  a  sampled  rate  of  8000HZ)  or  more 


conveniently  each  point  represents  one  data  point  in  the 
speech  file.  For  the  rest  of  the  thesis  unless  otherwise 
specified  the  horizontal  axis  will  be  described  as  a 
sequence  of  data  points.  From  the  plot  it  can  be  seen  that 
there  are  512  data  points  per  graph  which  is  conveniently  2 
blocks  of  data.  Typically  speech  files  are  88  blocks  long 
and  from  figure  IV-4  the  it  can  be  seen  that  only  80  blocks 
are  shown.  There  was  no  reason  to  include  plots  of  silence 
from  the  begining  of  the  speech  file  so  they  were  not 
included.  One  final  note:  a  particular  graph  (a  plot  of  2 
blocks  of  data)  of  a  speech  file  will  be  identified  by  a 
figure  number,  a  plot  number  (after  decimal  point)  and  the 
number  of  a  particular  graph  on  one  page  will  follow  a 
second  decimal  point  where  they  are  numbered  from  top  to 
bottom.  For  example,  the  third  graph  from  the  top  of  figure 
IV-4. 3  would  be  identified  as  figure  IV-4. 3. 3  .  For  details 
on  creating  these  plots  see  appendix  C. 

Looking  at  an  isolated  speech  plot  does  not  provide  one 
with  much  information.  For  example,  looking  at  figure 
IV-4. 1.5  only  tells  one  that  part  of  the  speech  is  voiced. 
Most  people  would  not  recognize  this  figure  as  the  word 
"The".  Additionally,  it  is  difficult  to  separate  the 
beginning  and  end  of  two  connected  phonemes. 

To  make  the  speech  plots  more  usable  the  various  letters 
of  an  utterance  have  been  included  under  the  plot.  A  slash 
"/"  indicates  the  approximate  beginning,  ending  or 
seperation  point  between  parts  of  the  utterance.  For 


example  the  "/"  in  figure  IV-4.6.5  seperates  the  "wh"  and 
"i"  sound  in  "while".  Also  the  utterance  is  printed  at  the 
bottom  of  the  figure  with  the  portion  of  the  utterance 
underlined  that  is  contained  on  the  page  of  plots. 

A  short  discussion  of  the  types  of  phonemes  and  some 
example  phonemes  will  be  presented  to  aide  in  understanding 
some  of  the  things  that  are  occuring  in  the  speech  plots. 

The  vowels  are  produced  by  quasi-periodic  glottal 
excitation.  The  manner  in  which  the  cross-sectional  area 
varies  along  the  vocal  tract  determines  the  resonant 
frequencies  (formats)  and  therefore  the  sound  (Ref  10).  The 
vowels  are  /IY/,  (beet);  /I/,  (bit);  /E/,  (bet);  /AE/, 
(bat);  /UH/  (but);  /A/,  (hot);  /OW/ ,  (bought);  /U/,  (foot); 
/OO/,  (boot);  and  /ER/,  (bird).  An  example  of  a  vowel  is 
the  "00"  in  "TO"  in  figure  IV-4.5.1. 

The  "th"  in  "the"  figure  IV-4.1.5  is  a  voiced  fricative. 
The  voiced  fricatives  are  /v/ ,  /th/,  /z/  and  /zh/.  A  voiced 
fricative  is  produced  by  vocal  cords  that  vibrate  (one 
excitement  source  at  glottis)  and  a  vocal  tract  constriction 
at  some  point  forward  of  the  glottis.  Thus  producing 
turbulence  in  the  neighborhood  of  the  constriction. 
Unvoiced  fricatives  are  essentially  the  same  as  voiced 
fricatives  except  the  glottal  excitation  isn't  present.  The 
unvoiced  fricatives  include  /f/,  /©/ ,  /s/  and  /sh/.  The  "s" 
in  "rust"  is  an  unvoiced  fricative,  however,  it  was  not  very 
pronounced  in  the  original  speech  file.  For  a  better 
example  of  an  unvoiced  fricative  see  the  "s"  in  "six"  in 
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figure  IV-17.4  . 

The  nasals  are  produced  with  glottal  excitation  and  the 
vocal  tract  completely  constricted  at  some  point  along  the 
oral  passage.  With  a  lowered  velum,  air  flows  through  the 
nasal  tract  with  sound  being  radiated  at  the  nostrils.  An 
example  of  the  nasal  sounds  (including  /m/,  /n/,  /»?/)  can  be 
found  in  figure  IV-4.7.4  with  "n"  in  the  word  "new". 

The  first  semivowels  sound  vowel-like  and  are 
characterized  by  a  gliding  transition  in  vocal  tract  area 
function  between  adjacent  phonemes.  Therefore,  the  acoustic 
characteristics  of  these  sounds  are  influenced  by  the 
context  in  which  they  occur.  The  semivowels  include  /w/ , 
/l/,  /r/,  and  /y/.  Although  the  "rH  in  "to  rust”  is  only 
directly  connected  to  the  word  "rust”  in  figure  IV-4.5.3  it 
is  clear  from  the  plot  that  the  "r"  performs  a  transition. 

Dipthongs  are  generally  considered  to  be  a  gliding 
monosyllabic  speech  that  starts  at  the  articulatory  position 
for  one  vowel  and  then  moves  toward  the  other  vowels 
position.  Dipthongs  include  /el/,  (bay);  /oU/,  (boat); 
/al/,  (buy);  /aU/,  (how);  /ol/,  (boy);  and  /ju/,  (you). 

Voiced  stop  consonants  /b/,  /d/,  and  /g/  are  produced  by 
building  up  pressure  behind  a  total  constriction  somewhere 
in  the  oral  tract,  then  suddenly  releasing  the  pressure. 
These  transient,  noncontinuant  sounds  are  dynamic  in  nature 
and  have  characteristics  that  are  highly  influenced  by  the 
vowel  which  follows  the  stop  constant.  For  an  example  of  a 
voiced  stop  see  figure  IV-4.3.2  containing  "b"  in  the  word 
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"began". 


Unvoiced  stop  consonants  are  similar  to  their  voiced 
counter  parts  except  that  the  vocal  cords  don't  vibrate. 
The  unvoiced  stop  consonants  include  /p/ ,  /t/,  and  /k/.  An 
example  is  the  "p”  in  "pipe"  in  figure  IV-4.2.2  . 

The  final  phonemes  to  be  mentioned  are  the  affricates. 
The  affricates  (including  /t  /  and  / j / )  can  be  modelled  as 
the  concatenation  of  the  unvoiced  stop  /t/  and  the  fricative 
If  I . 

Figure  IV-5  is  a  plot  of  synthesized  speech  from  the 
RLPC  vocoder.  In  general  terms  the  best  way  to  determine 
the  quality  of  speech  output  from  the  vocoder  is  to  listen 
to  it.  Unfortunately,  as  stated  before,  the  quality  of  good 
speech  is  difficult  to  determine.  From  the  informal 
listening  tests  it  was  determined  that  the  speech  in  figure 
IV-5  sounded  very  much  like  the  speech  in  figure  IV-4.  Just 
a  cursory  comparison  between  the  figures  will  show  some 
similarities  and  differences.  The  first  similarity  is  in 
the  pitch  period  for  voiced  speech.  As  would  be  expected  if 
the  pitch  detector  works,  voiced  speech  will  consist  of 
damped  sinusoids  at  the  proper  pitch  period.  An  example  of 
this  can  be  seen  in  figures  IV-4. 4.1  and  IV-5. 4.1.  There 
are  more  similarities  but  what  makes  the  speech  different  is 
probably  more  interesting. 

Probably  the  most  apparent  difference  is  the  gain 
adjustments.  For  example,  in  figures  IV-4. 3. 2-300  and 
IV-5. 3. 2-300  the  maximum  magnitude  in  the  original  speech 


varies  slowly  whereas  the  synthesized  speech  is  constant, 
then  increases  and  then  decreases.  Also  related  to  the  gain 
is  how  the  relative  magnitude  varies  over  the  whole  speech 
file.  Some  words  seem  to  receive  more  emphasis  than  others. 
One  final  difference  is  lack  of  finer  details  on  the 
synthesized  speech  waveform.  However  this  is  to  be  expected 
since  only  ten  poles  are  used  in  the  synthesis  system. 

In  addition  to  the  plots  of  the  original  and  synthesized 
speech  a  series  of  3-dimensional  (3-D)  plots  are  shown  in 
figure  IV-7.  Figure  IV-6  is  a  set  of  labeled  axis  that  are 
representative  of  all  the  axis  in  figure  IV-7  and  other  3-D 
plots.  The  log  magnitude  response  versus  frequency  is  found 
from  a  set  of  filter  (predictor)  coefficients.  A  series 
(time  sequential)  of  these  plots  are  combined  to  produce  a 
log  magnitude  versus  frequency  versus  time  plot.  The 
vertical  axis  (height)  is  the  log  magnitude  and  although 
scale  information  isn’t  available  viewing  the  general  trend 
as  time  varies  is  the  primary  function  of  the  plots.  The 
horizontal  axis  (width)  is  the  frequency  (o-4KHZ)  and  the 
diagonal  axis  (depth)  is  the  time  axis  representing  a  time 
span  of  512  samples  or  6.4  msec  unless  otherwise  stated.  A 
table  cross  referencing  the  speech  plots  and  formant  plots 
is  given  in  Table  IV-1  (see  page  IV-24)For  details  on 
creating  these  plots  see  appendix  C. 

The  orientation  of  the  axis  varies  for  different  plots 
because  this  author  did  not  have  control  over  axis 
orientation  (though  the  correspondence  to  frequency  etc.  has 
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not  changed) .  To  aide  in  comparing  the  log  magnitude  plots 
with  the  synthesized  speech  the  same  numbering  convention 
used  on  the  speech  plots  will  be  used  even  though  each  log 
magnitude  figure  will  only  have  one  figure  per  page.  For 
example,  figure  IV-7.3.3  is  a  log  magnitude  plot  of 
synthesized  speech  in  figure  IV-4.3.3.  In  addition  to  the 
figure  number  a  decimal  point  then  a  letter  "a",  "b"  or  "c" 
will  be  appended.  The  letter  will  indicate  the  time 
interval  between  subsequent  log  magnitude  versus  frequency 
plots.  A  letter  "a"  indicates  a  seperation  of  10  samples  or 
1.25  msec.  Letter  "b"  is  used  for  an  80  sample  seperation 
or  10  msec.  Finally,  "c"  is  for  a  160  sample  seperation  or 
20  msec  and  only  occurs  in  one  figure.  In  a  few  figures, 
for  example  (figure  IV-7. 3 .2.a-300)  a  dash  with  a  number 
will  follow.  The  -"number"  is  the  point  at  which  the  3-D 
plot  begins.  This  was  done  to  provide  a  clearer  plot  of 
vocal  information  in  cases  where  a  considerable  portion  of 
the  speech  plot  is  silence.  Note:  most  frames  of  silence  do 
not  have  a  corresponding  3-D  plot  therefore  some  figure 
sequences  will  be  missing. 

One  potential  problem  that  occurs  with  the  3-D  plots 
will  be  discussed.  If  one  looks  at  a  formant  as  it  varies 
with  time  it  will  be  noted  that  it  often  seems  to 
periodically  increase  and  decrease.  For  an  example  of  this 
see  figure  IV-20.6  then  look  at  the  lowest  frequency 
formant.  Note  how  as  time  passes  it  almost  looks  like  a  low 
frequency  sinusoid.  From  peak  to  peak  it  is  seven  formant 
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plots  apart  where  each  formant  plot  is  separated  by  10 
sample  points.  Thus  each  peak  is  about  70  samples  apart. 
By  looking  at  the  pitch  plot  in  figure  IV-25  (a)  it  can  be 
seen  that  the  pitch  period  is  roughly  70.  By  observing 
other  plots  a  similar  correspondence  can  be  noted.  One 
possible  explanation  is  that  when  the  pitch  impulse  occurs 
more  energy  is  in  the  system  and  this  causes  the  log 
magnitude  of  the  formant  to  be  larger.  There  may  be  other 
explanations  of  what  causes  this,  but  the  problem  this 
causes  is  the  same  whatever  the  origin.  The  3-D  formant 
plots  are  useful  for  comparitive  purposes.  However,  if  the 
time  between  formant  plots  (log  magnitude  versus  frequency) 
is  not  small  compared  to  the  pitch  period  subsequent  formant 
plots  may  be  in  peaks  and  valleys  causing  a  jagged  looking 
plot.  See  figure  IV-21  for  an  example  of  this  jagged  look 
then  look  at  figure  IV-20  which  is  also  a  3-D  plot  of 
"FIVE".  Unfortunately  figure  IV-20  requires  more  plots  but 
there  is  no  question  whether  the  jaggedness  is  inherent  in 
the  information  or  if  it  is  just  a  matter  of  the  location 
formant  plots  that  were  picked. 

It  is  worthwhile  to  look  at  some  of  the  various  phonemes 
and  observe  what  happens  in  the  3-D  plots.  First  an  example 
of  a  vowel  will  be  examined  at.  Figure  IV-7.5.1  and 
IV-7.5.2  are  an  example  of  the  vowel  /oo/  in  the  word  "to". 
From  the  3-D  plot  it  is  clear  that  three  distinct  formant 
exist  and  they  vary  little  through  the  whole  utterance  which 
is  what  is  expected  of  vowels. 


The  next  phoneme  is  the  semivowel  which  as  stated 
earlier  are  vowel-like  with  a  gliding  transition  between 
adjacent  phonemes.  A  nice  example  of  the  semivowel  /r/  is 
shown  in  figures  IV-7.5.3  through  IV-7.5.5.  From  these 
figures  the  vowel  like  formant  structure  can  be  seen.  Also 
the  formant  structure  shifts  from  figure  to  figure  hence  the 
"glide". 

Another  interesting  phoneme  in  figure  IV-7.1.5  is  the 
voiced  fricative  /th/.  Three  main  formants  exist  and  they 
vary  in  intensity  through  the  word  "the"  which  is  not 
surprising  if  one  listens  to  the  word  as  it  is  said. 

The  additional  phonemes  will  not  be  discussed  because 
phoneme  analysis  is  not  the  purpose  of  this  thesis. 
However,  enough  examples  have  been  provided  to  demonstrate 
possible  uses  of  the  output  plots. 

Some  additional  speech  and  3-D  plots  of  the  numbers  zero 
through  ten  are  included  in  this  thesis  because  they  provide 
more  examples  of  this  systems  potential  outputs.  The 
numbers  were  chosen  because  of  their  frequent  use. 

The  speech  file  of  numbers  was  created  as  three  separate 
files.  The  first  file  includes  utterances  of  the  numbers 
"one",  "two",  "three"  and  "four".  The  second  speech,  file 
contains  the  numbers  "five",  "six",  "seven"  and  "eight". 
The  third  speech  file  contains  the  numbers  "zero",  "nine" 
and  "ten".  A  plot  of  the  first  speech  file  is  shown  in 
figure  IV-11  and  the  vocoder  output  is  shown  in  figure 
IV-12.  A  plot  of  the  pitch  is  shown  in  figure  IV-17  (a)  . 
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Also,  3-D  plots  of  "one"  thru  "four"  are  shown  in  figures 
IV-13  thru  IV-16  respectively.  Figure  IV-18  is  the  original 
plot  of  speech  file  containing  "five"  thru  "nine"  and  figure 
IV-19  is  plot  of  the  vocoder  output  for  this  file.  Figure 
IV-20  is  the  3-D  plot  of  the  number  "five"  where  the  plots 
are  log  magnitude  versus  frequency  seperated  by  10  data 
points  or  1.25  msec.  The  format  of  the  supplementary  figure 
numbers  in  figure  IV-8  is  the  same  as  described  earlier. 

Figures  IV-21  thru  IV-23  are  formant  versus  time  plots 
of  the  numbers  "five"  thru  "eight"  respectively.  Figure 
IV-25  (a)  is  a  plot  of  the  pitch  period  for  the  numbers 
"five"  thru  "eight".  Figure  IV-26  is  the  original  speech 
plot  of  "zero. . .nine. .. ten"  and  figure  IV-27  is  the  vocoder 
output  plot.  Figures  IV-28  thru  IV-30  are  the  formant 
versus  time  plots  respectively.  The  pitch  period  plot  of 
this  last  file  is  shown  in  figure  IV-31.  Figure  IV-32.3  are 
formant  versus  time  plot  of  a  portion  of  the  word  "five". 
What  makes  these  plots  different  is  that  the  formant  plots 
(log  magnitude  versus  frequency)  are  only  one  sample  point 
apart  or  .125  msec.  The  three  plots  are  time  sequential 
with  the  first  plot  starting  at  point  138  in  figure 
IV-19. 1.2  .  Also  figure  IV-32  can  be  compared  with  figure 
IV-20. 2  where  the  first  figure  starts  roughly  14  formant 
plots  in.  Probably  the  most  pertinent  observation  is  that 
there  isn't  a  great  deal  of  additional  information  to  be 
gained  (in  this  case)  by  observing  formant  plots  each 
sample. 
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DATA  RATE 


Although  the  vocoder  system  created  for  this  thesis  is 
not  a  real-time  system  it  is  interesting  to  determine  what 
the  data  rate  of  this  system  is  and  what  it  could  be  if  it 
were  real-time. 

The  primary  reason  for  using  an  LPC  vocoding  system  is 
to  take  advantage  of  the  low  bit  rate  for  transmitting 
speech  information.  The  vocoder  system  consists  of  a 
transmitter,  channel,  and  receiver.  If  an  error  free 
channel  is  assumed  channel  encoding  is  not  required.  The 
transmitter  performs  RLPC  analysis  and  pitch  detection  then 
this  data  is  sent  across  the  channel.  The  receiver  uses  the 
channel  data  to  synthesize  speech. 

For  this  system,  integers  require  16  bits  and  floating 
point  numbers  require  32  bits.  To  determine  the  data  rate 
for  this  system  the  worst  case  will  be  assumed.  This  occurs 
when  pitch  information  and  predictor  coefficients  must  be 
sent.  Pitch  information  is  transmitted  once  every  80 


samples  where  16  bits  are  used  for  pitch  and  16  bits  are 
used  as  a  voiced/unvoiced  switch.  For  a  system  of  order  10 
this  is  320  bits  each  time  predictor  coefficients  are  sent. 
Also  32  bits  are  used  for  the  gain.  If  the  predictor 
coefficients  are  updated  every  1.25  msec  this  is  a  data  rate 
of  284,800  bits/sec.  Updating  every  10  msec  results  in  a 
data  rate  of  38,400  bits/sec.  Of  course  this  system  would 
never  be  implemented  tranfering  this  much  data  (primarily 
because  most  of  the  data  doesn't  contain  useful  data).  What 
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follows  is  a  description  of  the  data  rate  of  a  RLPC  vocoder 
that  could  be  implemented. 

For  an  order  p  RLPC  analysis  system  a  set  of  p  predictor 
coefficients,  a  gain  parameter,  a  voiced/unvoiced  parameter, 
and  the  pitch  period  must  be  sent  across  the  channel.  The 
vo iced /unvoiced  switch  requires  1  bit;  6  bit  quantization  of 
the  pitch  period  is  adequate;  and  the  gain  uses  about  5  bits 
if  distributed  on  a  logarithmic  scale  (Ref  1)  .  This  is  a 
total  of  12  bits  each  time  pitch  information  is  transmitted 
which  for  this  system  is  once  every  10  msec. 

If  one  considers  direct  quantization  of  the  predictor 
coefficients  then  8-10  bits  are  required  per  coefficient  to 
ensure  stability  of  the  predictor  polynomial.  This  is  the 
case  because  small  changes  in  the  predictor  coefficients  can 
lead  to  relatively  large  changes  in  the  pole  positions. 

The  obvious  question  is  what  appropriate  parameter  set 
can  be  used  for  coding  and  transmission.  Two  possible 
parameter  sets  include  using  predictor  polynomial  roots  or  a 
set  of  reflection  coefficients.  Predictor  polynomial  roots 
can  be  easily  quantized  in  such  a  way  that  the  resulting 
polynomial  is  stable.  This  stability  is  assured  because  the 
predictor  polynomial  roots  will  be  within  the  unit  circle. 
With  this  approach  5  bits  per  root  are  adequate  to  preserve 
the  quality  of  the  synthesized  speech  (Ref  10) . 
Unfortunately  a  processing  penalty  of  time  is  incurred 
solving  for  the  roots.  For  details  of  the  method  using  a 
set  of  reflection  coefficients  see  Ref  12. 
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Assuming  the  pole  method  is  being  used  and  the  predictor 
polynomial  is  order  10  (which  is  what  this  vocoder  used) ,  62 
bits  of  information  will  be  sent  across  the  channel  each  10 
msec.  This  corresponds  to  6200  bits/sec  if  the  data  is 
transmitted  each  10  msec.  This  data  rate  of  6200  bits/sec 
is  really  a  maximum  data  rate  for  sending  information  each 
10  msec.  First  it  must  be  realized  that  this  data  rate 
assumes  voiced  speech.  If  the  speech  is  unvoiced  pitch 
information  isn't  required  which  produces  a  data  rate  of 
5600  bits/sec.  If  the  speech  is  silence,  data  doesn't  have 
to  be  sent  (except  for  a  1  bit  silence  switch)  .  Since 
speech  is  a  combination  of  silence,  voiced,  and  unvoiced 
segments  the  average  data  rate  is  lower  than  6200  bits/sec. 

There  are  other  factors  that  can  affect  the  bit  rate. 
First,  the  bit  rate  can  be  lowered  by  only  transmitting  new 
information.  For  example,  if  the  predictor  coefficients 
don't  change  much  there  is  no  reason  to  re-transmit  the 
data.  In  a  similar  manner  the  data  rate  can  be  increased  by 
sending  more  information  in  places  where  the  formant 
structure  is  changing  quickly.  So  for  a  given  data  rate  the 
quality  of  speech  may  be  improved  by  being  selective  about 
which  information  is  sent  and  when. 

Noise 

As  in  other  sections  this  also  will  not  be  an  in  depth 
study  on  the  effects  of  noise  on  the  RLPC  vocoding  system. 
Basically  three  effects  were  observed.  First, 


noise 


(uniform  density)  was  added  to  an  original  speech  file. 
Secondly,  the  pitch  period  was  plotted  for  some  speech  files 
with  noise  and  thirdly  formant  plots  were  generated  using 
the  noise  files.  In  this  section  when  noise  is  mentioned  it 
refers  only  to  artificially  added  noise  and  not  the  noise  in 
the  speech  file  caused  by  the  original  recording. 

No  attempt  was  made  to  measure  the  effect  of  various 
noise  levels  on  the  vocoder  output  (such  as  various  SNR's  in 
db)  .  This  is  left  for  another  thesis.  However,  enough 
noise  was  added  to  the  input  speech  file  to  degrade  the 
quality  of  the  vocoder  output.  The  output  speech  had  a 
"buzz"  sound  to  it.  The  SNR  was  found  for  the  numbers 
"zero"  thru  "ten"  by  finding  the  ratio  of  the  speech  signal 
squared  without  noise  to  the  noise  squared.  This  comes  from 
the  basic  definition  of  signal  to  noise  ratio.  A  plot  of 
the  speech  file  "five"  plus  noise  is  shown  in  figure  IV-33 
and  it  corresponds  directly  in  time  with  figure  IV-18  which 
is  the  original  speech  plot  of  file  "five"  without  noise. 
The  vocoder  output  of  the  noisy  speech  is  shown  in  figure 
IV-34.  See  table  IV-1  for  figure  numbers  of  the  other 
numbers  with  noise. 

The  signal  to  noise  ratios  for  all  the  numbers  follows: 


1) 

"zero"- 

12 . 72db 

2) 

"one"- 

9 . 23db 

3) 

"two"- 

13 . 62db 

4) 

"three"- 

10 . 27db 

5) 

"four"- 

13 . 69db 

6) 

"five"- 

10 . 48db 
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7) 

"six"- 

9 . 31db 

8) 

"seven"- 

7 . 72db 

9) 

"eight"- 

7 . 39db 

10) 

"nine"- 

8 . 3  4db 

ID 

"ten"- 

8 . 47db 

To  determine  if  the  choice  of  starting  and  stopping 
positions  (for  the  SNR  computation)  was  critical  the 
beginning  and  end  position  for  the  word  "one"  was  varied. 
First  the  beginning  and  end  position  was  decreased  by  10%  of 
the  original  interval  used  for  the  SNR  computation.  The  SNR 
was  found  to  be  10.13  which  is  a  .9db  increase  over  the 
original.  This  process  was  repeated  for  a  20%  decrease  at 
the  beginning  and  end.  The  resulting  SNR  was  1.71db  greater 
than  the  original  or  10.94db.  These  results  indicate  that 
the  choice  of  starting  and  ending  positions  in  the  SNR 
compputation  do  not  produce .  drastic  changes  but  that 
comparing  signal  to  noise  ratios  for  different  utterance 
should  be  used  only  for  rough  comparisons. 

In  comparing  the  vocoder  output  in  figure  IV-34  with  the 
noisy  speech  in  figure  IV-3  3  the  most  obvious  difference 
seen  is  that  the  vocoder  output  is  much  smoother  than  the 
noisy  speech.  This  is  not  suprising  since  LPC  systems  are 
only  an  approximation  and  with  only  an  order  of  10  it  is 
harder  to  pick  up  the  higher  frequency  components.  Another 
difference  to  be  noted  between  the  plots  is  that 
occasionally  the  vocoder  misses  being  implused  since  it 
"thinks"  that  the  speech  segment  is  unvoiced  such  as  in 
figure  IV-34. 4  .  Comparing  the  pitch  period  plots  with  and 
without  noise  (figures  IV-25  (b)  and  IV-25  (a)  respectively) 
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it  can  be  seen  that  there  are  more  unvoiced  regions  with 
noise  then  when  noise  isn't  present.  It  is  interesting  to 
observe  the  differences  between  the  vocoder  outputs  for  the 
cases  with  and  without  noise  as  inputs.  Figure  IV-19 
without  input  noise  and  figure  IV-34  with  input  noise  have 
subtle  differences  (excluding  the  voiced/unvoiced  problem) 
which  are  caused  by  the  formant  structure. 

Pitch  period  plots  have  been  provided  for  the  speech 
files  containing  the  numbers.  Where  figure  IV-17,  IV-25, 
and  IV-31  are  pitch  plots  of  the  speech  files 
"one. . .two. . .three. . .four",  "five. . .six. . .seven. . .eight", 
and  "zero. . .nine. .. ten”  respectively.  As  stated  earlier, 
without  noise  more  of  the  pitch  plot  is  voiced.  The  noise 
makes  the  pitch  detector  believe  that  the  speech  isn't 
voiced.  This  isn't  to  suprising  since  a  synthesizer 
simulates  unvoiced  speech  with  noise. 

The  final  item  to  be  discussed  is  the  effect  of  noise  on 
the  formant  plots.  Figures  IV-35  thru  IV-38  are  3-D  formant 
plots  of  the  utterances  "one”,  "two",  "three",  and  "four" 
respectively.  The  separation  between  formant  plots  is  10 
data  points  or  10  msec.  Figure  IV-39  is  a  series  of  3-D 
formant  plots  of  the  word  "five"  where  the  number  after  the 
decimal  point  refers  to  the  corresponding  speech  plot  in 
figure  IV-20.  The  figure  IV-35  thru  IV-38  are  interesting 
and  it  is  clear  that  the  noise  adds  some  high  frequency 
formants.  Unfortunately  due  to  the  problem  described 
earlier  where  the  formant  magnitude  varies  slowly  it  is 
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difficult  to  determine  if  the  jaggedness  of  the  plots  is  due 
to  noise  or  picking  highs  and  lows  for  a  formant.  Because 
of  this  figure  IV-39  is  more  useful.  By  comparing  this 
figure  to  figure  IV-20  differences  due  to  noise  are  much 
easier  to  evaluate.  The  differences  do  not  appear  to  be 
drastic  but  generally  the  shape  of  the  formant  appear 
smoother  and  more  high  frequency  formants  exist  in  the  case 
with  noise  (figure  IV-39). 

CONCLUSION 

The  recursive  system  is  configured  in  such  a  manner  that 
the  predictor  coefficients  can  be  available  as  often  as 
needed.  As  would  be  expected,  updating  the  synthesizer  more 
often  improves  the  quality  of  speech.  The  greatest 
improvement  in  quality  is  noticed  when  the  synthesizer 
filter  is  updated  every  10  msec  instead  of  20  msec. 
Improvement  is  also  noted  when  updating  every  1.25  msec 
instead  of  10  msec  but  the  amount  of  improvement  wasn't  as 
large  as  the  change  from  20  msec  to  10  msec. 

The  data  rate  of  the  RLPC  vocoder  is  comparable  to 
typical  LPC  systems.  Judicious  use  of  additional 
information  available  in  the  RLPC  system  may  improve  the 
quality  of  speech  without  a  great  impact  on  the  data  rate. 

The  RLPC  system  is  affected  by  noise  in  both  the  pitch 
detector  and  the  speech  analyzer.  The  pitch  detector  often 
mistakes  voiced  speech  for  unvoiced  speech  and  the  shape  of 
higher  frequency  portion  of  the  formant  plots  change. 
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This  table  contains  a  description,  list,  and  cross 
reference  of  speech  and  formant  plots.  The  formant  plots 
are  produced  using  the  predictor  coefficients  found  in  the 
speech  analysis.  Therefore,  only  cross  references  between 
synthesized  speech  plots  and  formant  plots  exist.  The 
number  in  parenthesis  is  the  time  separation  in  msec  between 
formant  plots.  For  a  more  detailed  list  see  the  List  of 
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Speech 

Figure 

Utterance 

Formant 

Figure 

IV-4 

Original-"The  pipe  began  to  rust 
while  new" 

None 

IV- 5 

Vocoded-  "The  pipe  began  to  rust 
while  new" 

IV- 7  (1.25, 
10,  or  20) 

IV-8 

Original-"Thieves  who  rob  friends 
deserve  jail" 

None 

IV- 9 

Vocoded-  "Thieves  who  rob  friends 
deserve  jail" 

None 

IV-11 

Original-"One. . .two. . .three. . .four" 

None 

IV- 12 

Vocoder-  "One. . .two. . .three. . .four" 

see  below 

IV-12.1 

Vocoder-  "One" 

IV-13  (10) 

IV-12.4 

Vocoder-  "Two" 

IV-14  (10) 

IV-12.6 

Vocoder-  "Three" 

IV-15  (10) 

IV-12.8 

Vocoder-  "Four" 

IV-16  (10) 

IV-18 

Original-"Five. . .six. . .seven. . . 
eight" 

None 

IV-19 

Vocoder-  "Five. . .six. . .seven. . . 
eight" 

see  below 

IV-19.1 

Vocoder-  "Five" 

IV-20 (1.25) 

IV-19. 1 

Vocoder-  "Five" 

IV-21  (10) 

IV-19.1 

Vocoder-  "Five" 

IV-32 (.125) 

IV-19.3  Vocoder-  "Six" 


IV-22  (10) 


Table  IV- 1  (cont'd) 
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(a)  alpha=.97 
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Figure  IV-2  Impulse  Response  of  Window 


Note*  1)  Vertical  axis-  Magnitude  of  speech  at  sampled 

point 

2)  Horizontal  axes-  5 12  sampled  points  (10msec) 

3)  Speech  is  plotted  time  sequencially  from 
left  to  right  then  top  to  bottom 
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Figure  IV -4. 5  "The  pipe  began  to  ruat  while  new" 
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Figure  IV-5.1  The  Vocoder  Output 
"The  pipe  began  to  rust  while  new" 
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Figure  IV-7.3.3*a 


ka-1.25  msec  between  formant  plots 
►b-10  msec  between  formant  plots 
»c-20  msec  between  formant  plots 

-Number  corresponding  to  one  plot  on 
one  page  (numbered  top  to  bottom) 

-Number  corresponding  to  vocoded  speech 
plot  number 


-Formant  figure  number 


Notes  This  example  could  correspond  to  the  vocoder  speech 
plot  figure  IV-5.3  the  third  plot  down  from  the  top 
of  the  page. 


Figure. IV-6  Skeleton  Axis  for  Formant  Plots 
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Figure  IY-7.2.5.b 


Figure  IV -? 


Figure  IV-7.4.1. 
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Figure  IV-7.4.5. 
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Figure  IV-7.8.1. 
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Figure  IV-7.8.2. 
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Figure  IV-8.1  Original  speech  file 
’Thieves  who  rob  friends  deserve  jail’ 
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Figure  IV-8.4  "Thieves  who  rob  friends  deserve  jail" 
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Figure  IV-8.6  "Thieves  who  rob  friends  deserve  jail 
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Figure  IV-8.7  "Thieves  who  rob  friends  deserve  jail" 
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Figure  IV-9.1  Synthesized  Speech 
Thieves  who  rob  friends  deserve  jail" 
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Figure  IV-9.5  "Thieves  who  rob  friends  deserve  jail 
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Figure  IV-11.7  "OWE. . .TWO. . .THREE. . .FOUR” 
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Figure  IV-12.3  "ONE  ...  TWO ...  THREE ...FOUR" 
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Figure  IV-12.4  "ONE. . ■ TWO . .. THREE 
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Figure  IV-12.6  "CNE ...  TWO. .. THREE. . .FCUR" 
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Figure  IV-17  Pitch  plot  of  "ONE. . .TWO. . .THREE. . .FCUF 
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Figure -IV-18. 1  "FIVE.  .  .SIX.  .  .SEVEN.  .  .EIGHT” 
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Figure  IV-18.7  "FIVE. .  .SIX. .  .SEVEN. .  .EIGHT” 
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Figure  IV-20.1.3.a  "FIVE 
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Figure  IV-21.b  "FIVE 
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Figure  IV-25  Pitch  Plot  of  "FIVE 
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Figure  IV-27. 2  "ZERO. .. NINE. .. TEN" 
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Figure  IV-32. 1 .2-138  The  beginning  of  the  word  "FIVE" 
.125  msec  between  formant  plots  for  all  of  figure  IV-32) 
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Figure  IV-33.1  Original  Speech  File 
"FIVE .  .  .  SIX . .  .  SEVEN.  .  .EIGHT"  +  noise 
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Figure  IV-33.2  "FIVE  ...  SIX ...  SEVEN. . .EIGHT"  +  noise 
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Figure  IV-33-4  "FIVE. . .SIX. . .SEVEN, . .EIGHT"  +  noiae 
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Figure  IY-33-6  "FIVE. . .SIX. . .SEVEN. . .EIGHT”  +  noise 
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Figure  IV-33-7  "FIVE  ...  SIX ...  SEVEN. . .EIGHT”  + 
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Figure  IV-33.8  "FIVE. .. SIX. .  . SEVEN. . .EIGHT"  +  noise 
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Figure  IV-34.1  Vocoder  Output  with  Input  of 
"FIVE. . .SIX. . .SEVEN. . .EIGHT"  +  noise 
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V.  Recomendations 


The  areas  for  future  research  which  either  use  the  RLPC 
vocoder  system  or  portions  of  the  system  include  the 
following: 

1.  Formal  listening  tests  (including  rhyme  tests)  could 
be  performed  to  arrive  at  a  quantitative  measure  of  how  the 
rate  of  updating  the  synthesis  filter  affects  the  quality  of 
speech.  The  use  of  a  variable  rate  system  could  be 
investigated  with  the  intent  of  finding  a  method  that  would 
determine  when  it  is  best  to  transmit  data. 

2.  A  method  to  perform  spectral  distance  tests  could  be 
created.  This  would  be  an  additional  measure  to  help  judge 
the  differences  between  the  RLPC  system  and  other  LPC 
systems.  Differences  less  than  3dB  have  been  shown  to 
represent  a  very  small  variation  between  systems. 

3.  Additional  methods  for  determining  gain  information 
could  be  investigated.  The  method  used  in  this  thesis  takes 
the  energy  in  the  error  signal  to  find  the  gain,  possibly  a 
better  method  exists. 

4.  The  recursive  method  for  performing  the 
autocorrelation  could  be  used  for  LPC  analysis  or  formant 
estimation  (for  example  in  speaker  verification  system) . 
Typically  the  large  amount  of  computation  for  on-line 
implementations  is  a  limiting  factor,  the  recursive  method 
could  be  a  partial  solution  to  this  problem.  The  recursive 
architecture  would  work  well  with  a  VLSI  circuit  realization 
which  might  work  well  in  a  speaker  verification  system. 
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APPENDIX  A 


Software 


The  software  system  is  designed  to  be  versatile  thereby 
allowing  the  user  a  great  deal  of  flexability  for 
experimentation.  The  overall  system  design  is  shown  in 
figure  A-l.  The  system  has  three  major  programs  including 
the  pitch  detector,  a  speech  analyzer  and  a  speech 
synthesizer.  Additionally,  there  are  some  utility  programs. 
Each  of  the  main  programs  and  the  set  of  utility  programs 
will  be  described  separately  as  sections.  For  details  on 
using  the  programs  or  substituting  files  see  the  users 
manual  in  appendix  C. 

Pitch  detectors 

As  described  in  chapter  III  two  pitch  detecting 
algorithms  were  chosen.  The  first  to  be  described  is  the 
Center  Clipping  Autocorrelation  Pitch  Detector  (AUTOC) . 
Basically  the  software  structure  shown  in  figure  A-2  closely 
parallels  the  block  diagram  of  the  pitch  detector  in  figure 
III-l. 

From  figure  A-l  it  can  be  seen  that  the  pitch  detector 
uses  two  input  files,  one  output  file  and  is  run  from  the 
console  as  an  autonomous  unit.  The  first  input  file 
contains  initializing  values  for  the  various  subroutines. 


These  values  are  the  clipping  level  and  the  voiced/unvoiced 
threshold.  The  other  input  file  is  a  binary  speech  file 
with  two's  complimentary  numbers.  The  output  file  contains 
a  list  of  records  where  one  record  of  pitch  information  is 
created  for  each  80  input  speech  samples.  Each  pitch  record 
contains  a  "0"  if  the  current  frame  is  unvoiced  and  a  "0" 
for  the  pitch  period.  For  a  voiced  frame  the  record 
contains  a  "1"  for  voiced  and  the  current  pitch  period  in 
number  of  sampling  rate  periods  (1/8000  Hz) .  Additionally  a 
silence  indicator  is  set  to  ''l”  for  speech  (as  opposed  to 
silence) .  For  silence  the  record  will  consist  of  three 
-O-'s. 

The  logical  program  flow  is  shown  in  figure  A-2.  After 
the  system  is  initialized  a  segment  of  speech  is  read  in. 
The  subroutine  ENERGY  computes  the  energy  in  each  frame  and 
compares  it  with  an  unvoiced/ voiced  threshold.  If  the 
energy  is  greater  than  the  threshold,  the  frame  is 
considered  voiced,  otherwise  it  is  unvoiced.  Next  the 
subroutine  SETMAX  finds  the  largest  absolute  maximum  value 
in  the  first  third  and  the  last  third  of  a  frame.  The 
smaller  of  the  two  values  is  sent  to  the  next  subroutine 
CLPPER.  CLPPER  multiplies  this  value  by  a  constant  (a 
typical  value  is  .68)  and  the  result  is  called  the  clipping 
level.  Any  value  between  +  and  -  the  clipping  level  is  set 
to  0.0  .  Anything  larger  or  smaller  than  +  or  -  the 
clipping  level  is  set  to  1.0  or  -1.0  respectively.  This 
tri-level  output  sequence,  created  by  CLPPER,  is  the  input 


to  subroutine  AUTCOR  which  performs  a  short  time 


autocorrelation  computation  on  the  sequence.  The 
autocorrelation  is  performed  using  an  80  point  sequence 
shifted  over  the  frame  with  a  maximum  of  160  shifts. 
Finally  subroutine  AUTCOR  finds  the  first  autocorrelation 
peak  greater  than  80%  of  R(0).  The  appropriate  information 
(voiced/unvoiced  etc.)  is  output  to  the  pitch  file  and  the 
whole  process  is  started  over  if  more  speech  is  to  be 
processed. 

The  second  pitch  detector  has  been  thoroughly  described 
by  (Ref  9)  with  one  exception,  a  silence  detector  was  added. 
In  the  same  fashion  as  the  other  pitch  detector  the  energy 
in  the  frame  is  determined  and  if  it  exceeds  the  silence 
threshold  then  it  is  speech.  If  not  it  is  silence.  To  be 
compatable  with  the  overall  system  the  input  and  output 
files  are  identical  with  the  previous  pitch  detector. 


Speech  analyzer 

The  speech  analyzer  has  one  input  file,  one  output  file 
and  is  run  as  an  autonomous  unit  from  the  console.  The 
input  file  contains  initializing  values  and  the  output  file 
is  a  set  of  predictor  coeficients.  The  flow  chart  of  the 
speech  analyzer  is  shown  in  figure  A-3. 

Four  values  are  used  from  the  initializtion  file.  The 


first  real  value  is  the  filter  shape  coeficient.  This  is 
the  value  described  in  chapt^-  II  and  it  determines  the 
window  shape  of  the  recursive  autocorrelator  (with  a  typical 


value  of  .98).  The  second  real  value  allows  the  program 
user  to  scale  the  input  speech  (a  typical  value  is  1.0). 
The  first  integer  variable  determines  the  separation  (in 
sample  points)  of  filter  coeficients  that  are  output.  For 
example,  if  the  variable  is  80  then  a  set  of  filter 
coeficients  is  written  to  the  output  file  every  80  points. 
The  second  integer  variable  is  a  "1"  if  the  user  wishes  to 
have  the  speech  pre-emphasized ,  otherwise  a  "0". 

Initially  one  block  (256  integer  values)  of  speech  is 
read  in.  Subroutine  RLPC  recursively  computes  the 
autocorrelation  values  for  each  speech  sample  in  the  block. 
Subroutine  INVFIL  uses  the  autocorrelation  values  and 
calculates  a  set  of  LPC  predictor  coeficients.  At  the 
interval  determined  by  integer  filter  separation  variable 
the  filter  coeficients  are  written  to  the  output  file.  Then 
another  block  of  speech  is  read  in.  This  process  continues 
until  there  is  no  more  speech. 

Speech  synthesizer 

The  speech  synthesizer  has  three  input  files  and  one 
output  file  and  is  run  as  an  autonomous  unit  from  the 
console.  The  first  input  file  contains  initializing  values, 
the  second  is  a  file  of  LPC  predictor  coeficients,  and  the 
third  is  a  file  of  pitch  data.  The  synthesized  speech  is 
written  to  the  output  file.  The  flow  chart  of  the  speech 
synthesizer  is  shown  in  figure  A-4. 

First,  subroutine  IOF  reads  in  the  input  and  output  file 
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names.  Three  values  are  used  from  the  initialization  file. 
The  first  value  is  a  real  number  that  allows  the  user  to 
change  the  emphasis  on  the  voiced  and  unvoiced  input  to  the 
digital  filter  synthesizer.  The  next  two  initialization 
values  are  integers.  The  first  is  the  separation  between 
frames  and  the  second  is  a  "1"  if  de-emphasis  is  or  "0"  if 
not. 

If  the  current  segment  of  speech  to  be  produced  is 
voiced  the  subroutine  VOICED  is  called.  VOICED  produces  a 
sequence  u(n)  (the  glottal  pulse)  that  is  as  long  as  the 
current  pitch  period.  The  current  pitch  period  (at  the 
current  frame's  beginning)  was  found  using  linear 
interpolation.  If  current  segment  is  unvoiced  then 
subroutine  UNVOCD  is  called.  UNVOCD  produces  a  normally 
distributed  random  sequence  u(n)  80  samples  long. 

Whether  voiced  or  unvoiced  the  sequence  u(n)  is  passed 
to  subroutine  THROAT.  This  subroutine  uses  u(n)  as  an  input 
to  a  direct  form  digital  filter.  The  coeficients  of  the 
filter  come  from  the  LPC  predictor  coeficients  found  in  the 
program  CODER.  The  user  has  the  option  of  how  often  to 
update  the  coeficients  and  based  on  this  the  filter 
coeficients  are  asynchronous  with  the  pitch.  The  output 
sequence  of  the  filter  is  the  synthesized  speech.  If 
selected  the  speech  is  de-emphasized  and  then  with  or 
without  de-emphasis  the  speech  is  written  to  the  output 
filter.  If  more  speech  is  to  be  produced  the  process  starts 
all  over  by  checking  for  voiced  or  unvoiced,  otherwise 
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processing  stops. 


Utility  software 

Five  utility  programs  were  created  to  allow  the  user  to 
look  at  a  variety  of  outputs  and  put  them  in  a  useful  form. 
The  primary  use  of  the  programs  are  stated  below: 

The  first  program  SCALE1  takes  the  vocoder  output  and 
scales  the  speech  to  the  proper  magnitude  for  the  D/A 
converter. 

The  second  program  PLTTER  either  plots  the  pitch 
detector  output  or  a  speech  file. 

The  third  program  SETUP  allows  the  user  to  update  a  file 
of  parameters  that  are  the  initialization  values  for  the 
vocoder. 

The  fourth  program  KEVAL  lets  the  user  create  a  file 
which  is  the  input  to  a  3-D  plotter. 

The  final  program  FORMANT  allows  the  user  to  create  an 
artificial  vowel  using  poles  or  a  set  of  filter  coeficients. 

SCALE  1  requires  one  input  file  and  one  output  file. 
The  input  file  is  any  binary  (blocked)  speech  and  the  output 
file  becomes  a  scaled  speech  file  with  a  maximum  of  2000.0. 
One  additional  option  allows  uniform  noise  to  be  added  to 
the  speech  file.  The  speech  file,  plus  noise,  is  still 
scaled  to  a  maximum  of  2000.0.  The  maximum  value  for  the 
noise  is  input  at  the  console  and  actually  scales  the 
uniform  (-0.5  to  0.5)  density  function.  For  example,  if 
500.0  is  entered  at  the  console  the  noise  falls  between 


-250.0  and  250.0  inclusively.  The  subroutine  DRAND  is 
called  to  find  the  random  number.  This  program  or  an 
equivalent  one  must  be  used  if  the  user  is  going  to  output 
speech  on  the  D/A  converter.  The  D/A  converter  only  allows 
numbers  to  vary  between  -2048  and  2047  so  SCALE1  meets  this 
requirement. 

The  program  PLTTER  requires  one  input  file.  The  file 
name  is  entered  at  the  console  and  this  file  can  be  either 
the  file  output  from  the  pitch  detector  for  a  pitch  plot  or 
a  speech  file  for  a  plot  of  speech.  If  a  pitch  plot  is 
required  two  plots  are  output  on  the  PRINTRONIX  printer. 
The  top  plot  is  a  plot  of  the  pitch  period  and  the  bottom  is 
a  plot  of  silence  (a  "1")  or  speech  (a  "0")  .  The  second 
possible  output  is  a  plot  of  the  speech  file.  Each  page  has 
ten  plots  of  speech  except  the  last  page  which  has  the 
required  number  of  plots  to  plot  of  the  remaining  speech. 
The  only  subroutine  called  PLOT10  which  is  available  in  the 
Data  General  system.  If  the  user  requires  constant  scales 
on  plot  output  two  statements  must  be  added  to  the  systems 
PLOT10 .  The  two  statements  follow  the  line  of  code  "CALL 
ASCALE  (Y,..."  and  are  as  follows: 

FY=-3000 . 0 

DY=  3000.0 

These  set  the  scales  to  a  maximum  value  of  3000.0.  PLTTER 
automatically  scales  the  speech  to  a  maximum  value  of 
3000.0. 

SETUP  requires  one  input  file  and  either  uses  the  input 


file  as  an  output  file  or  uses  an  additional  file  for 
output.  The  purpose  of  this  program  is  to  allow  the  user  to 
change  the  initial  conditions  of  the  vocoder  by  changing  the 
initial  conditions  file.  The  input  file  is  a  file  already 
containing  initial  conditions.  If  a  new  file  is  to  be 
created  the  program  sets  all  values  to  zero  and  then  the 
user  inputs  initial  conditions  from  the  console.  If  the 
input  file  contains  initial  conditions  the  user  is  allowed 
to  change  any  value.  A  list  of  the  initial  conditions  can 
be  output  to  the  PRINTRONIX  printer  and/or  the  console. 
Finally  the  initial  conditions  can  overwrite  the  input  file 
or  be  put  in  a  new  file.  No  subroutines  are  called  by  this 
program. 

KEVAL  has  one  input  file  and  if  the  option  is  selected 
one  output  file.  The  input  file  is  a  file  of  predictor 
coefficients  output  from  the  vocoder.  The  first  value  in 
the  file  is  the  order  (p)  of  the  filter  and  the  remaining 
values  are  sets  of  predictor  coefficients  of  length  p.  One 
of  three  possible  outputs  are  available  each  time  the 
program  is  run.  Basically  different  methods  of  viewing  how 
predictor  coefficients  vary  was  created.  The  first  method 
allows  the  user  to  select  a  sequence  of  predictor 
coefficients  which  KEVAL  uses  to  find  a  log  magnitude  plot 
for  each  coefficient  set.  These  log  magnitude  plots  are 
output  to  a  file.  This  file  is  formatted  to  be  input  to  the 
program  PL0T3D  which  is  on  the  Data  General  system.  PL0T3D 
then  is  used  to  make  a  3-D  formant  plot.  The  second  type  of 
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output  allows  the  user  to  plot  a  series  of  formant  plotr  on 
the  PRINTRONIX  printer.  The  third  type  of  output  allows  the 
user  to  use  the  Tektronix  4010-1  terminal  to  create  linear 
magnitude  plots,  log  magnitude  plots,  phase  plots  and/or 
impulse  response  plots  for  subsequent  predictor  coefficient 
sets.  The  required  system  subroutines  include  PLOTIO  for 
Tektronix  plots  and  PLOT5  is  needed  for  Printronix  plots. 

FORMANT  is  a  program  that  was  developed  originally  to 
create  a  file  which  is  the  output  of  an  artificial  vowel 
that  is  impulsed  at  a  constant  frequency.  An  input  file  can 
be  used  to  either  enter  the  vowels'  filter  coefficients  in 
direct  form  or  as  poles  and  zeros.  If  the  input  is  a  file 
of  poles  and  zeros  then  these  are  converted  to  coefficients 
in  the  direct  form  using  the  subroutines  EXPAND  and  POLYMLT . 
If  input  files  are  not  available  the  user  can  enter  either 
pole  and  zeros  or  direct  form  filter  coefficients  through 
the  console.  If  the  user  wants  to  use  the  same  predictor 
coefficients  later  they  can  be  saved  in  an  output  in  the 
direct  form.  Formant  impulses  the  digital  filter  (which 
uses  the  predictor  coefficients)  at  the  impulsed  frequency. 
The  output  from  this  process  is  then  written  to  the  output 
file.  The  user  can  then  have  the  impulse  response  plotted 
using  the  system  routine  PLOTIO. 

In  conclusion,  each  of  the  utility  programs  provide 
various  ways  to  make  information  from  the  vocoder  more 
useful . 
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This  appendix  contains  listings  of  all  the  software  used 
in  the  vocoder  including  utility  software.  What  follows  is 
a  table  of  contents  for  the  software. 
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PROGRAM: 
AUTHOR : 
DATE: 
LANGUAGE: 
FUNCTION: 


CODER 

WILL  JANSSEN 
6  JULY  80 
FORTRAN# 

THIS  PROORAM  INPUTS  A  SPEECH  FILE  (NO  PARTICULAR 
LENGTH  REQUIRED)  AND  PRODUCES  A  FILE  OF  LPC 
COEFICIENTS. 


LOAD  LINE: 
RUN  LINE: 
INPUTS: 


OUTPUT: 


RLDR  CODER  RLPC  1NVFIL  IOF  CFLIB* 

CODER  SI /I  PRAM/ I  COEF/I 

FILSPI-IS  THE  FIRST  INPUT  FILE  WHICH  IS  AN  INTEGER 
SPEECH  FILE  (BINARY) 

PAR AM— 2ND  INPUT  FILE  (OUTPUT  OF  SETUP)  CONTAINING  29 
RECORDS  OF  REAL  DATA  THEN  29  RECORDS  OF  INTEGER  DATA. 

THIS  DATA  IS  USED  TO  INITIALIZE  VARIOUS  PARTS  OF  THIS 
PROGRAM  (RECOMEND  USING  UTILITY  PROGRAM  SETUP  TO  INITIALIZE 
THIS  FILE).  DATA  FROM  THIS  FILE  USED  IN  THIS  PROGRAM: 
REC0RD*4-RPARAM(4)->RLPC  WINDOW  SHAPE  (TYPICALLY  V8> 
REC0RD#9-RPARAM(9)->SPEECH  SCALING  FACTOR  (TYP.  10) 
REC0RD#27-IPARAM(2)->#  OF  POINTS  BETWEEN  OUTPUT  LPC 

FILTER  COEFICIENTS. 

R£C0R0#28-IPARAM(3>*>( 1-USE  PRE-EMPHASIS. O-DON'T) 

C OF A— THIS  FILE  IS  A  BINARY  FILE  CONTAINING  LPC  PREDICTOR 

COEFICIENTS  WHERE  THE  FIRST  VALUE  IS  THE  FILTER  ORDER 
(VARIABLE  NORDER).  THE  REST  OF  THE  FILE  IS  ORGANIZED  AS 
SETS  OF  LPC  FILTER  COEFICIENTS  (NORDER  OF  THEM  PER  SET) 
WHERE  THE  OTH  POSITION  IS  ASSUMED  TO  EQUAL  1.0. 


C****«  *••****•*'••*•••••**••*•*«'•**«***••'»••»*•«•••*«-»*••*••»****•*** 

DIMENSION  SP1 (-80: 296). CQEF(296. 0: 10). SP10LD(2). 

X  0(296).  RPARAM(29). I PAR AM (29) 

INTEGER  FILUFD (18).  SPEECH ( 296 ) . FI LSP I ( 7 ) .  PAR AM ( 7 ) . COFA ( 7 ) . 
X  AUT0AN(7) 

DOUBLE  PRECISION  W(0:3. 0:10) 

C 

C***READ  INPUT  AND  OUTPUT  FILES  FOR  DATA 
C 

I ENDS  -  O 
NUMB  *  O 
SP 1  OLD  ( 1 )  -00 
SP10LD(2)  -0.0 
NFILES  -  3 

CALL  IOF (NFILES. MAIN. F1LSPI.  PARAM, COFA. AUTGAN. 

X  MS.  S1.S2.  S3.  S4) 

CALL  OPEN(  1.  FILSPI.  1.  IER) 

IF( IER  .  NE.  1 ) TYPE" OPEN  ERROR  “.IER 

MBLOCKS  -  1  .SET  FOR  READING  ONE  BLOCK  AT  A  TIME 

CALL  OPEN (2.  PARAM. 1.  IER) 

IF ( IER  .  NE.  1 ) TYPE "OPEN  ERROR  ON". PARAM( 1 > .  IER 
DO  10  1-1.29 

READ  FREE(2>  RPARAM(I) 

10  CONTINUE 

DO  11  1-1.29 

READ  FREE (2)  IPARAM(i) 

11  CONTINUE 

CALL  CLOSE (2.  IER) 

IF ( IER  . NE.  1 ) TYPE "CLOSE  FILE  ". PARAM( 1 ). IER 
CALL  OF I LW( COFA. IER) 


IF! IER  EQ.  13>C0  TO  60 
IF <  IER  .  NE.  1 ) TYPE “DELETE  ERROR".  XER 
60  CALL  CFILW!COFA. 2. IER) 

IF< IER  NE.  1 )TYPE"CREATE  FILE  ERROR  “.IER 
CALL  0PEN!3. COFA, 3. IER) 

IF ( IER  .  NE.  1 ) TYPE“OPEN  ERROR  ON". COFA! D . IER 
C 

C***ZERO  SPEECH  ARRAY  FOR  FIRST  RUN 
C 

ISAV  -  80 
ITOT  -  ISAV  «■  357 
IS  -  ISAV  ♦  1 
DO  100  KN-1.  ITOT 

II*  KN  —  ( ISAV  +  1) 

SPUIl)  -  O.  0 
100  CONTINUE 

A  -  RPARAMI4)  ; FILTER  SHAPE  COEFICIENT 

SSCALE  *  RPARAMd)  i  SET  SCALE  FACTOR  TO  REDUCE  SPE.  MAC. 

NPOINS  *  IPARAM<2>  i FILTER  POINTS  SEPERATION 

IPRE  -  I  PAR  AMO) 

NCUR  *  1 

NORDER  *  10  i ORDER  OF  THE  FILTERS 
C 

C***PASS  FILTER  ORDER  TO  SYNTH 
C 

WRITE  BINARY! 3)  NORDER 
C 

C***  ZERO  FILTER  FOR  FIRST  RUN 
C 

DO  140  1*0.3 
DQ  140  11-0.10 
W<  I.  ID  *  O.  ODO 
140  CONTINUE 

C 

C***PR0CESS  THE  DATA 
C 

NV  -  0 

145  CONTINUE 

IF!NV  EQ.  OICO  TO  175  iSKIP  FIRST  TIME 

DO  ISO  NT* 1. ISAV  iSAVE  LAST  80  SPEECH  VALUES 

NT1  *  NT  -  ISAV 
NT2  *  NT  «■  356  -  ISAV 

SPl(NTl)  *  FLOAT ! SPEECH! NT2 )) /SSCALE  ; SCALE  SPEECH  TO  MAX  3 
150  CONTINUE 

C 

C***READ  IN  SPEECH 
C 

175  CALL  RDBLK! 1. NV. SPEECH. MBLOCKS. IENDS) 

IF! ! IENDS  .  NE.  D  .  AND.  !  IENDS  . NE.  9) ) TYPE "READ  BLOCK  ERROR  ", IENDS 
DO  300  1-1.356 

SPUD  -  FLOAT !  SPEECH!  D> /SSCALE 
300  CONTINUE 

C 

C  PRE-EMPHA8IZE  THE  SPEECH 

C 

IF! IPRE  EQ.  0)00  TO  310 
DO  300  1—75.356 
SP 1QLD! 1 )  i-  SP1!I> 

SPUD  -  SPUD  -  .9  *  SP10LD!2> 

SP10LD!2)  -  SPlOLDil) 
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300  CONTINUE 
310  CONTINUE 

CALL  RLPCISP1.  NORDER.  W,  A,  CQEF. 0) 

C 

C**»WRITE  PREDICTOR  COEFICIENTS  THEN  GAIN  TO  COEFICIENT  FILE  FOR  EACH  FRAME 
C 

500  IF INCUR  . GT.  256 ) GO  TO  1000 

DO  900  KB- 1. NORDER 

WRITE  BINARYI3)  COEFI NCUR, KB > 

900  CONTINUE 

WRITE  BINARY(3)  OINCUR) 

NCUR  -  NCUR  ♦  NPOINS 
00  TO  500 
1000  CONTINUE 

NCUR  *  NCUR  -  256 
NUMB  -  NUMB  +256 

C  TYPE- JUST  COMPLETED  “.NUMB."  POINTS’* 

NV  -  NV  ♦  1 

I F ( I ENOS  .  N£.  9)00  TO  145 
9000  CONTINUE 

9500  CONTINUE 

CALL  CLOSE! 1.  IER> 

IFIIER  N£.  1 )TYPE"CLOS£  ERROR  ON  FILSPI  “ .  FILSP III). IER 

CALL  CLOSE 1 3.  IER) 

IFIIER  N£  1 )  TYPE  “CLOSE  ERROR  ON  COFA  ", COFAI l > .  IER 
CALL  CLOSE 1 4.  IER) 

STOP 

END 
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SUBROUTINE  IOF<N.  MAIN.  Ft,  F2,  F3, F4, MS, St. SO. S3, S4 ) 

C ***************************  ********* ***************************** 

C  ADAPTED  FROM  SUBROUTINE  WRDTEN  BY  LT  SIMMONS  tO  SEPT  81 

C 

C  THIS  FORTRAN  5  SUBROUTINE  WILL  READ  FROM  THE  FILE 

C  COM.  CM  !  FCOM.  CM  IN  THE  FORE  GROUND)  THE  PROGRAM  NAME, 

C  ANY  CLOBAL  SWITCHES.  AND  UP  TO  THREE  LOCAL  FILE 

C  NAMES  AND  CORRESPONDING  LOCAL  SWITCHES. 

C 

C  ARGUMENTS: 

C 

C  N  IS  THE  NUMBER  OF  LOCAL  FILES  AND  SWITCHES  TO  BE 

C  READ  FROM  (F) COM.  CM.  N  MUST  BE  1.  2.  OR  3. 

C 

C  MAIN  IS  AN  ASCII  ARRAY  FOR  THE  MAIN  PROGRAM  FILE  NAME. 

C 

C  Fi.  F2,  F3.  AND  F4  ARE  THE  THREE  ASCII  ARRAYS  TO  RETURN 

C  THE  LOCAL  FILE  NAMES. 

C 

C  MS  IS  A  TWO-WORD  INTEGER  ARRAY  THAT  HOLDS  ANY  GLOBAL 

C  SWITCHES. 

C 

C  SI.  S2.  S3,  AND  S4  ARE  TWO-WORD  INTEGER  ARRAYS  THAT 

C  HOLD  THE  LOCAL  SWITCHES  CORRESPONDING  TO  Ft  THROUGH 

C  F4  RESPECTIVELY. 

C 

C*************************************************************** 

c 

C  DIMENSION 

C 

DIMENSION  MAIN! 7 ) .  MS< 2) 

INTEGER  Ft !7).F2!7).  F3<7>.  F4 < 7 > . SI <2 > , S2<2> . 

X  S3 ( 2) ,  SA  4 7) 

CHECK  BOUNDS  ON  N 

IF<N.  LT.  1.  OR.  N.  CT.  4>ST0P  i  N  OUT  OF  BOUNDS 
PROCESS  THE  DATA  IN  <F>COM.  CM 


1 


2 


CALL  GROUND! I)  . FINO  OUT  WHICH 
IF <  I.  EG.  OiOPEN  0.  "COM  CM" 

IF<  I .  EG.  DOPEN  0.  "FCOM.  CM" 

CALL  C0MARG<0. MAIN. MS,  IER  > 
IFUER.NE.  DTYPE"  COMARG  ERROR: 
WRITE! 10. l)MAIN(l) 

FORMAT!'  PROGRAM  '. SI 3. ' 

CALL  COMARG !0.  FI.  St.  JER ) 

IFtJER.  NE.  DTYPE"  COMARG 
IF!N.  EG.  1  >GQ  TO  2 
CALL  COMARG !0>F2.  S2,  KER) 

IFIKER.  NE.  DTYPE"  COMARG 
IF!N.  EG.  2>CQ  TO  2 
CALL  C0MARC!0. F3.  S3. LER > 

IFtLER.  NE.  DTYPE"  COMARG 
IF  IN.  EG.  3) CO  TO  2 
CALL  C0MARG!0,  F4.  S4>  NER  > 

IF1NER.  NE.  DTYPE"  COf 
CLOSE  0  1 

RETURN 


GROUND  PROGRAM  IS  IN 
i  OPEN  CH.  0  TO  COM  CM 
i  OPEN  CH.  0  TO  FCOM.  CM 
iREAD  FROM  (F )COM.  CM 
“.  IER 

i TYPE  PROGRAM  NAME 


RUNNING 

.  ' ) 

,  READ 

FROM 

<F)CQM.  CM 

1  ERROR 

1 F I  ):  ", 

i  JER 

i  TEST 

N 

;  READ 

FROM 

! F )COM.  CM 

S  ERROR 

!F2 ) :  ”, 

i  KER 

i  TEST 

N 

i  READ 

FROM 

<  F  )CQM.  CM 

i  ERROR 

!F3) :  ", 

.  LER 

i  TEST 

N 

t 

.  READ 

FROM 

!F)COM  CM 

i  ERROR 

!  F4 ) :  " , 

.  NER 
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,-\  SUBROUTINE  HLPC (SPl. NORDER.  W.  A,  COEF,  G> 

C  ***********************  ***************  ********************************* 

c 

c  this  subroutine  calculates  the  recursive  autocorrelation  for 

C  EACH  FRAME  WHERE  THE  FRAME  CAN  VARY  FROM  ONE  SAMPLE  TO  80 

C  SAMPLES  /  FRAME 
C 

C  INPUTS: SPl  -THIS  IS  THE  INPUT  SPEECH 

C  NORDER-THIS  IS  THE  ORDER  OF  THE  RECURSIVE  AUTOCORRELATION 

C  SYSTEM 

C  W  -THIS  IS  THE  INTERMEDIATE  NODAL  VALUES  OF  THE 

C  RLPC  FILTER 

C  A  -THIS  IS  ALPHA  THE  VALUE  OF  THE  WINDOW  SHAPE 

C 

C  OUTPUT: COEF  -THIS  IS  A  SET  OF  PREDICTOR  COEFFICIENTS  THAT 
C  CUME  FROM  A  SUBROUTINE  CALL  TO  INVFIL 

C  G  -THIS  IS  THE  GAIN  ALSO  FROM  INVFIL 

C 

C**»»*«»*»***»***»**»»»**»«*»»**********»*»»*»**»*»»*****»»******* 

DIMENSION  SPl (-80:  236),  G ( 206 ). COEF ( 256. 0:  10). 

X  Till). COE  <  0:  10) 

DOUBLE  PRECISION  R < 0:  10 ) . W( 0  3. 0  10 ) . TEMP  1 ,  TEMP2,  S ( 0  10 ) 

C  TYPE-RUNNING  RLPC" 

AO  -  1.0 

A1  -  3.  0  *  <  A**2 ) 

A2  *  -3.  0  *  <A**4> 

A3  *  A**6 
C 

C***PROCESS  236  SPEECH  SAMPLES  THROUGH  THE  FILTER 
C 

DO  1300  N*  1.236 
C 

C***SET  UP  DELAY  PRODUCTS  INTO  FILTER 

c 

DO  500  1*0. NORDER 

S<I>  *  DBLE1SP 1 (N-I )  *  SP1<N>>  ; INPUT  TO  FILTERS 

300  CONTINUE 

C 

C ***C ALCULATE  R.A)  FOR  EACH  FILTER 
C 

DO  600  K*0.  NORDER 

W(O.K)  A1*W<  1.  K)+A2*W(2.  K)+A3*W(3.  K)+S(K) 

TEMPI  ■  <K+1)  *  <A**K>  *  W(C . K ) 

TEMP2  *  <H-1)  *  <A**<K«-2>>  *  W<1,A) 

R  < K )  *  TEMPI  -  TEMP2 
C  TYPE-R*".  R<K>.  "  N*  M.N.  "  K-  ",  K 


600 

CONTINUE 

DO  630  KS*1. NORDER 

RUXS)  *  R < KS )  /R < 0 ) 

630 

CONTINUE 

ROS  *  SNGL ( R ( 0 ) ) 

R<0)  -1.0 

C  ACCEPT  "NEXT?**.  NYZKUH 

C 

C***  PREPARE  W  FOR  NEXT  AUTOCORRELATION 
C 

DO  700  K-0. NORDER 
W < 3. K )  -  W<2. K ) 

W(2.  K ) 1  *  W( 1, H> 

W< 1. K)  *  W( 0. K ) 
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nD-A138  008  A  RECURSIVE  LINEAR  PREDICT  I VE  VOCODERS;  AIR  FORCE  INST  4/4  ^ 

OF  TECH  WRIGHT-PRTTERSON  HFB  OH  SCHOOL  OF  ENGINEERING 
W  A  JANSSEN  DEC  83  AFIT/GE/EE/83D-33 


UNCLASSIFIED 


F/G  17/2 


NL 


W<0.  K>  a  O  0 
CONTINUE 

CALL  INVFIL  <  R.  NORDER.  COC.  GF ) 

GIN)  *  SORT ( NOS  *  Gfi/iOUOOO  0 
DO  723  I WHY-O. NORDER 

COEF(N. I WHY)  -  COH I WHY) 

CONTINUE 

00  9000  IB-0. NORDER 
IB1  -  IB  «•  I 
T(IBl)  -  SNOLR(ID) 

CONTINUE 
NODE  -  I 
NO  -  1 

NBUT  >  NORDER  ♦  I 
1F8CL  -  0 

ACCEPT “DO  NEXT  ONE?".  NY 

CALL  0RPH2< "AUTCOR“.  NO.  T.  U.  NBUT.  NODE.  YN.  YA.  IFSCL) 
TYPE-N  •  “,N 
CONTINUE 
RETURN 
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THIS  SUBROUTINE  IS  THE  INVERTER  PORTION  OF  THE  SUBROUTINE 
AUTO  AS  PRESENTED  IN  MAHKEL  fc  t.'MAY 

INPUTS  :  R5  -THIS  IS  A  SET  OF  AUTOCORRELATION  VALUES  PASSED 
FROH  THE  SUBROUTINE  RLPC 
NOR DER -THIS  18  THE  ORDER  OF  THE  RLPC  SYSTEM 

OUTPUTS: COE  -THIS  IS  A  SET  OF  PREDICTOR  COEFFICIENTS  THAT  ARE 
found  by  invert i no  the  autocorrelation  matrix 
OF  -THE  CURRENT  OAIN  OF  THE  SYSTEM 

SUBROUTINE  INVFIL<R5. NOR DER. COE. OF) 

DIMENSION  C0E(0: 10) 

DOUBLE  PRECISION  R510: 10).  R( 1 1 > . RC!21 > . A!20>.  8.  ALPHA. AT 
C 

M  -  NOR DER 
MP-M+1 

DO  13  K-l.MI* 

R ( K )  -  RS(K-l > 

IS  CONTINUE 

X  WRITE! 12.  999) <R<1).  1-1.4) 

X  WRITE! 12.  998) !R ( I ).  1»3. HP) 

X99S  FORMAT ! 3X»  6! 016. 8. IX)) 

X999  FORMAT! IX.  6!616.  8.  IX)) 

RC!  1  )— R!2)/R!  1 ) 

A(l)«l. 

A!2)-RC!1> 

ALPHA-R ( 1 ) +R ! 2 ) *RC ! 1 ) 

DO  40  MINC-2.M 
S-0.  ODO 

DO  20  IP-l.MINC 

S-S+R ! MI NC- I P+2 ) *A( IP) 

20  CONTINUE 

RC !  M1NC  )—S/ ALPHA 
MH-MINC/2+1 
DO  30  IP-2. MH 

IB-MINC-IP+2 

AT -A! IP ) +RC ! MINC ) »A! IB  > 

A! IB )-AI IB) +RC IMINC ) *A! IP) 

A!  IP )-AT 

30  CONTINUE 

A!M1NC+1)-RC!MINC> 

ALPHA— ALPHA+RC ( MINC ) »S 
IF! ALPHA) 30.  30.  40 
40  CONTINUE 

30  DO  300  NT-0. NORDER 

COE! NT)  -  SNCL1A1NT+I ) ) 

300  CONTINUE 

X  WRITE! 12.  1001) 

X  WRITE! 12. 1002) 1C0E! I),  1-0.4) 

X  WRITE!  12.  1002HC0E11).  1-3.  M) 

XI 002  FORMAT!  IX.  6!1)16.  S.  IX)  > 

X1001  FORMAT !lX. “ALPHAS  ") 

CF  -  SNOL! ALPHA) 

RETURN 
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PROGRAM:  SYNTH 

AUTHOR:  WILL  JANSSEN 

DATE  13  JULY  83 

LANCUAGE:  FOR TR AND 

FUNCTION:  THIS  PROGRAM  INPUTS  A  PREDICTOR  COEFICIENT 
FILE  AND  PITCH  INFORMATION  TO  SYNTHESIZE 
SPEECH. 

LOAD  LINE: RLDR  SYNTH  VOICED  THROAT  UNVOCD  CFLIBQ 

RUN  LINE:  SYNTH  PRAM/ I  COEF/I  SPE/Q  SNOUT/ I 

INPUTS:  P ARAM— 1ST  INPUT  FILE  (OUTPUT  OF  SETUP)  CONTAINING 

39  RECORDS  OF  REAL  DATA  THEN  39  RECORDS  OF  INTEGER 
DATA.  THIS  DATA  IS  USED  TO  INTIALIZE  VARIOUS  PARTS 
OF  THIS  PROORAM  (RECOMEND  USING  UTILITY  PROGRAM  SETUP 
TO  INITIALIZE  THIS  FILE).  DATA  FROM  THIS  FILE  USED  IN 
THIS  PROGRAM: 

REC0RD47— RPARAM ( 7 ) *>RAT 10  OF  UNVOICED  TO  VOICED  IMPULSE 
(OPTIMUM  AROUND  .  1) 

RECORD«8-RPARAM(8)OPOS.  SLOPE  PORTION  OF  GLOTTAL  PULSE 

(EX.  .  1  WOULD  BY  1/10  OF  CURRENT 
PITCH  PERIOD) 

R£CORD*9-RPAHAM(9)ONEG.  SLOPE  PORTION  OF  GLOTTAL  PULSE 

(UNITS  SAME  AS  ABOVE) 

REC0R0037-IPARAM(3>->NUMBER  OF  POINTS  BETWEEN  FRAMES 

REC0RD«38-IPARAM(3)->( 1-USE  DE-EMPHASIS.  O-DON'T) 

REC0RD#39-IPARAM(4)*>( 1-USE  GLOTTAL  PULSE. O- IMPULSE) 

COFA— 2ND  INPUT  FILE  IS  A  BINARY  FILE  (OUTPUT  OF  CODER)  WHERE 
THE  FIRST  VALUE  IS  THE  FILTER  ORDER  (VARIABLE  NORDER). 

THE  REST  OF  THE  FILE  IS  ORGANIZED  AS  SETS  OF  LPC  FILTER 
C0EFIC1ENT8  (NORDER  OF  THEM  PER  SET)  WHERE  THE  OTH  POSITION 
IS  ASSUMED  TO  EQUAL  1.0.  < 

PTCHIF— 3RD  INPUT  FILE  (OUTPUT  OF  PITCH)  CONTAINING  3  VALUES 
PER  RECORD  (SEE  LINE  470  FOR  THE  FORMAT) 

V0ICD3-THE  FIRST  VALUE  (1-FOR  VOICED  SPEECH. O-FOR  UNVOICED 
IF  NOT  SILENCE) 

SIL2— THE  SECOND  VALUE  (I -FOR  SPEECH.  O-FQR  SILENCE) 

PP2— THE  THIRD  VALUE  IS  THE  PITCH  PERIOD  (IN  SAMPLES) 

OUTPUTS:  SPCH-IS  THE  OUTPUT  FILE  WHICH  IS  AN  INTEGER  FILE  OF  THE 
GENERATED  SPEECH. 


C  ***■•#■•**■•******•**************  ******************************************* 
DIMENSION  RPARAM(S9). I PAR AM (23).  SPEECH(396). 

X  U( 180). FILTER (20). S(180).W(0:30> 

INTEGER  PARAM(7).  C0FA(7>.  PTCHIF(7).  SPCH(7>.  V0ICD2.  VQICD1. 

X  SIL1. 81L3. PP1.  PP2. FRMPOS.  PPP0S1. PPP0S3. PPF. FRMSIZ. 

X  X(29*>.  1NTS(296) 

DOUBLE  PRECISION  IX 

DATA  IS.  1P.K8.KEND/1.0.  0.0/ 

C 

C**»READ  INPUT/OUTPUT  FILE  NAMES 
C  i 

IX  -  DBLE(203) 

C 

C***  ZERO  SYNTHESIS  FILTER 
C 


AW  V.’ 


00  3  1-0.20 
W(I>  -  0.  0 
CONTINUE 

ALPHA  -  1000.  0  < OUTPUT  WAIN  VALUE 

NFILES  -  4 

CALL  IOF<NFILES.  MAIN.  PAR AM.  COFA.  SPCH.  PTCHIF.  MS.  SI. 
S2. S3.  S4> 

TYPE-PAST  IOF“ 

CALL  OPEN (2.  PAR AN.  1.  IER ) 

IF < IER  .  NE.  1 > TYPE "OPEN  ERROR  ON  -. PARAN< 1 ). IER 


C*»*  INPUT  1NITIAL1Z1N0  DATA 
C 

DO  10  1-1.29 

READ  FREE12)  RPARAH(I) 

10  CONTINUE 

DO  11  I-1.2S 

READ  FREE (2)  XPARAM(I) 

11  CONTINUE 

X  TYPE-READ  IN  PRAM- 

CALL  CLOSE (2.  IER) 

IF< IER  NE.  1 ) TYPE-CLOSE  ERHOR  ON  PARAMI 1 >.  IER 
CALL  OPEN <3. COFA.  3.  IER) 

!F(  IER  .  NE.  DTYPE-  OPEN  ERROR  ON  -.  COFAC 1 ).  IER 
CALL  0PEN14.  PTCHIF. 1. IER) 

IF<  IER  .NE.  DTYPE-  OPEN  ERROR  ON-.  PTCHIF  ( 1  >.  IER 
CALL  OPEN!  1.  SPCH.  3.  IER) 

IF< IER  .  NE.  1 ) TYPE-OPEN  ERROR  ON  -. SPCHC 1 >.  IER 
FRHSIZ  -  1PARAH<2>  .  DETERMINE  FRAME  SIZE 

IDE  -  IP  AH  AMO)  >  DE-EMPH  IF  EO  TO  1 

RAT 101  -  RPAHAM<7)  i  RATIO  OF  UNVOCD  TO  VOICED  IMPULSE 

I PUL  -  I PAH AM ( 4 ) 

PERI  -  RPAHAM < 8 ) 

PER2  -  RPARAM(9) 


DO  400  1*1.  ISO 
S<I)  -  0.  o 
U(I)  -  0.0 
CONTINUE 


4  INITIALIZE  THE  SPEECH  ARRAY 


READ  BINARYO)  NORDER  <  DETERMINE  FILTER  ORDER 
READ! 4.  470)  V01CD2.  SIL2.  PP2 
470  F0RMAT<3X.  3(110.  3X>) 

NVERYF  -  1 
LASTD  •  O 
PPP0S2  -  1 
PPPQ81  -  1 
NPOS  -  1 
PP1  -  0 
NBEflP  -  1 
NFILP  -  1 
NVOLD  -  0 
S40  -  0.0 
SAO  -00 
830  •  0.  0 
C 

C*** INITIALIZE  FILTER 
C 

DO  490  K-l. NORDER 

READ  BINARY(3)F1LTER(K) 


I  ^ 
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V 

W.SV.V/A’ 


i 


490  CONTINUE 

READ  BINARY ( 3 >G 
C 

C***START  SPEECH  LOOP 
C 

679  CONTINUE 
C 

C***  CHECK  FOR  PROPER  PITCH  BOUNDARY  PLACEMENT 
C 

680  IFIPPP082  .  CE.  NBEOP)  00  TO  700 

PPP081  •  PPP0S3  i  UPDATE  PITCH  PERIOD  BOUNDARY 
PPI  -  PP2 

NVOLD  •  V0ICD2  » STORE  OLD  V01C/UN  SWITCH 
READ (4. 470)  V0ICD2. S1L2.  PP2 

PPP0S2  *  PPP0S2  ♦  80  >N0TE:  THIS  VALUE  MAY  BE  OREATER  THAN  296 
CO  TO  680 
700  CONTINUE 

IF ( (V0ICD2  .  EQ.  0).  OR.  (NVOLD.  EO.  O))  CO  TO  800  i  VOICED/UNVOICED  SUJ 

N01F  »  PP2  -  PPI  i CALCULATE  PITCH  PERIOD  FOR  FRAME  DEC 

NPOSDF  «NBECP  -  PPPOSl 

RATIO  -  FLOAT (NPOSDF) /SO.  0 

IF(NVERYF  EC.  1>PP1  -  PP2 

NVERYF  »  O 

PPF  -  INT'RATIO  *  FLOAT (NDIF) >  ♦  PPI  > INTERPOLATE  PITCH 
CALL  VOICEDIU. PPF.  ALPHA.  IPUL. PERI. PER2) 

ISIZE  »  PPF  t KEEPS  TRACK  OF  PITCH  PERIOD  SENT  TO  THROAT 

790  CONTINUE 

CO  TO  900 
800  CONTINUE 

ISIZE  *  80 

CALL  UNVOCDIUa ISIZE.  ALPHA.  IX.  SIL2.  RATI01 ) 

900  CONTINUE 

CALL  THROAT < U. ISIZE. FILTER. NORDER. C. S. U.  FRMSIZ. NBECP. 

X  LASTD.  NF1LP.  SIL2.  KS) 

C 

C***OUTPUT  SPEECH 
C 

DO  970  MV-1.  ISIZE 
C 

C***  DE-EMPHAS1ZE  SPEECH 
C 

940  IF( IDE  .EO.  0>C0  TO  990 

S60  -  S(MV) 

S30  *  860  ♦  .  9  «  S40 

IF  ( <330.  LT.  .  001).  AND.  <S30.  CT  001)  )S30-0.  O 
S40  -  S30 

I NTS < MV)  «  !FIX(S30) 

CO  TO  960 

990  INTS(MV)  -  IFIX <S(MV) ) 

960  CONTINUE 

970  CONTINUE 

C 

C*m  WRITE  OUT  SPEECH 
C 

1000  IP  a  IP  ♦  ISIZE 

L  •  1 

IFUP.CE.  296)00  TO  1210  >  SPLIT  S<  I  >  fc  WRBLM .  .  .  .  X, .  .  .  > 

DO  1200  1-18. IP 
X(I)  -  INTS(L) 

L  -  L  ♦  1 
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CONTINUE 
CO  TO  1240 
CONTINUE 

IP  -  IP  -  256  .RESET  IP 
DO  1220  1-IS. 256 

X < I >  »  INTS(L)  i LOAD  UP  X(I> 

L  ■  L  ♦  1 
CONTINUE 

CALL  WRBLKU.KS.  X.  1.  IER> 

1F< IER  .EG.  9)00  TO  9000  .END  OF  FILE 

IF( IER  . NE.  1>TYPE"WRBLH  ERROR  ON  FILE  #2". IER 
IF( IP  .EG.  0)00  TO  1230 
DO  1230  I-l.  IP 

X<I)  -  INTS(L)  i RESTART  LOAD  UP  OF  X  <  X  > 

L  »  L  ♦  1 
CONTINUE 
KS  ■  KS  *  1 
IS  ■  IP  ♦  1 

IFILASTD  .  NE.  1)00  TO  675  » ZERO  OUT  TO  END  OF  FILE 

IFiKS  .  GE.  88) CO  TO  9000 
DO  1300  I>IP.2S6 
I NTS! I )  -  0 
CONTINUE 
IP  »  1 
GO  TO  1220 

00  TO  675  <  CONTINUE  CREATING  SPEECH 

CONTINUE 

CALL  DVDCHKI IC0DE1 ) 

IFUCODEl  .EG.  DTYPE-DIVIDE  BY  ZERO  OCCURRED" 

CALL  OUERFLl 1C0DE2) 

IF< IC0DE2  EG.  1 ) TYPE-OVERFLOW  OCCURRED" 

IF( IC0DE2  .EG.  3) TYPE "UNDERFLOW  OCCURRED" 

STOP 

END 
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SUBROUTINE  THROAT  < U.  ISIZE.  FILTER.  NOROER.  C.  S.  W.  FRMSI2.  NIIEGP. 
X  LASTD.NFILP.NSIL1.IN> 

if#******************.*****************#*  ***************** 


THIS  SUBROUTINE  INPUTS  U  <A  SEQUENCE  OF  VOICED/UNVOICED 
INPUTS)  AND  PASSES  THEM  THROUGH  A  TIME  VARYING  DIGITAL  FILTCK 
TO  PRODUCE  AN  OUTPUT  SPEECH  SEQUENCE. 

INPUTS:  U  -IS  THE  EXCITATION  TO  THE  “THROAT**  DIGITAL  FILTER 
WHICH  CAN  BE  EITHER  VOICED  OR  UNVOICED. 

ISIZE  -IS  THE  LENGTH  OF  THE  INPUT  EXCITATION  U. 

FILTER— IS  THE  VALUES  OF  THE  COEFFICIENTS  (AND  KEEPS 
OLD  VALUES  BETWEEN  CALLS) 

NORDER-IS  THE  ORDER  OF  THE  FILTER 
6  -IS  THE  GAIN 

S  -IS  THE  OUTPUT  SPEECH  SEQUENCE 

W  -IS  THE  VALUE  OF  THE  FILTER  AT  EACH  NODE 

FRMSIZ-IS  THE  NUMBER  OF  POINTS  BETWEEN  COEFFICIENT  UPDATES 
NBEGP  -KEEPS  TRACK  OF  THE  CURRENT  POSITION  (WHEN  THIS  EXCEEDS 
A  FRAME  BOUNDARY  NEW  COEFFICIENTS  ARE  READ  IN) 

LASTD  -WHEN  THIS  EQUALS  ONE  THE  PROGRAM  HAS  READ  THE  LAST 
COEFFICIENT  SET 

NFILP  -XEfcPS  TRACK  OF  ABSOLUTE  FRAME  POSITION 
NSIL1  -A  ZERO  INDICATES  SILENCE.  A  ONE  IS  SPEECH 
IN  -CURRENT  BLOCK  NUMRLR 


**#**#***#***#*****#*«********#******#***##**********i 

DIMENSION  U( ISO). FILTER (20).  W(0: 20).  T(80).  S( 180) 
INTEGER  FRMSIZ 
X  DO  20  J-l.SO 

X  T(J>  -0.0 

X20  CONTINUE 

IF(NSIL1  .EG.  DCO  TO  30 
DO  29  1*1.20 
W(I>  -  O.  O 

29  CONTINUE 

30  DO  900  N*l» ISIZE 

IF(NBEGP  .  LT.  (NFILP  *  FRMSIZ) >00  TO  307 
DO  300  K-l.NORDER 

READ  BINARY (3, ENU*309)K1LTER(K> 

300  CONTINUE 

READ  BINARY (3. END*309)0 
NFILP  *  NFILP  «•  FRMSIZ 
GO  TO  307 
309  LASTD  *  1 

307  CONTINUE 

TOTAL  *  0.  0 
DO  400  K-l.NORDER 

TOTAL  -  TOTAL  -  W(K>  *  KILTER (K) 

400  CONTINUE 

W(0)  *  TOTAL  ♦  G  *  U(N> 

X  T(N>  -  W(0> 

IF(N8IL1  EQ.  DCO  TO  429 
S(N)  -  O.  O 
GO  TO  436 

429  8(N>  -  W(0> 

436  CONTINUE 

C  IF( ( IN  LT.  B3)  OR.  (IN.  CT  04DG0  TO  437 

C  TYPE"U(N)  “.  U(N).  **S(N)  ",  S(N).  "N  N.  “BLOCK".  IN.  NSIH 

C  TYPE"8IL1". NS1L1 


c« 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


THIS  SUBROUTINE  CREATES  A  NORMAL  RANDOM  SEQUENCE  WHICH  WILL 
BE  AN  INPUT  TO  THROAT 

INPUTS:  PRMSI Z-LENOTH  OF  THE  OUTPUT  SEQUENCE  U<N> 

Q  -SCALING  VALUE  < TYPICALLY  IOOO.O) 

IX  -DOUBLE  PRECISION  SEED  (GOOD  INITIAL  VALUE  IS  203.0) 
SIL2  — ( 1— UNVOICED. 0— SILENT) 

RATIO  -SCALE  FACTOR  (TYPICALLY  .  1)  FRACTION  OF  Q  USED 
RELATIVE  TO  VOICED  SPEECH 

OUTPUT:  U  -A  NORMALLY  GENERATED  RANDOM  SEQUENCE  OF  LENGTH. FRMSI2 
IF  SIL2-1  ELSE  A  SEQUENCE  OF  0‘S  (SAME  LENGTH). 

_ _ mt _ 

SUBROUTINE  UNVOCD(U.  FRMSIZ. G.  IX, SIL2.  RATIO) 


DOUBLE  PRECISION  Ul. V. W. T. X. E. E2.  E10. E3. 2. P»  PI. F. P2, P3. P4. P5. PI 
X  .  PI2.  Al.  A2.  A3.  A4.  AS.  A6.  A7.  A8.  A9,  A10.  AI2.  A13.  A14.  A1S.  A16.  A17.  A18 
DOUBLE  PRECISION  INTEOER  IX 
INTEGER  FRMSIZ. SIL2 
DIMENSION  U(180) 

DATA  At/.  88407040229873800/. 

X  A2/1.  13113163344418000/. 

X  A3/.  98663347708694900/. 

X  A4/.  9387 20824790 463 DO/. 

X  AS/.  63083480192196000/. 

X  A6/.  7SSS91 S3 166760 100/. 

X  A7/.  0342403037 SO 1 1  IDO/. 

X  A8/.  91 131278028870300/ . 

X  A9/ .  47972740422244 IDO/. 

X  A10/1.  103473661 0220700/ . 

X  A12/.  872834976671 79000/ . 

X  A13/.  04926449637312800/. 

X  A 14/.  S9SS0713801 3940100/. 

X  A1S/.  80SS77924423817D0/. 

X  A16/.  03337734930688600/ . 

X  A17/.  973310934 17389800/ . 

X  E/2.  2 16033867 16647 IDO/. 

X  A1B/.  18002319106836300/. 

C  • 

C***SCALE  FOR  SILENCE 
C 

IF ( SIL2  .  NE.  0)00  TO  300 
DO  400  1-1. 180 
U(I)  -  0.  O 
400  CONTINUE 

00  TO  2300 
300  CONTINUE 

C 

C***  CALCULATE  THE  NORMAL  FUNCTION 
C 

PI  -  3.  141392633600 
PI2  -  (PI*2. 0)**(-.  3> 

E2  -  E**2.  O 
E3  -  E2/2.  0 
DO  2200  N  -  1. FRMSIZ 
Ul  -  DRAND(IX) 

IF(U1  .  OT.ADCO  TO  1000 
V  -  ORANO(IX) 
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X  -  E  *  <A2  #  U1  ♦  V  -  1 ) 

GO  TO  9000 

IF<U1  . LT  A 1 7 > GO  TO  1200 

V  »  DR AND (IX) 

W  -  DR AND ( IX) 

T  -  E3  -  DLOG(W) 

E10  ■  T  *  V*#2 

IF(E10  .  CT.  E3)C0  TO  1005 

IF1U1  LT  A3 >60  TO  1010 

X  -  -<2.  0  *  T)*».  3 

00  TO  9000 

X  •  <2.  0  *  T>*».  5 

00  TO  9000 

IF(U1  LT.  A4)C0  TO  1300 

V  -  DR AND < I X  > 

W  «  DR AND ( IX) 

Z  »  V  -  U 

T  «  E  -  AS  *  DHINKV.  W) 

P  -  DHAXKV.M) 

IF<P  .  LT.  A6>C0  TO  1800 
PI  -  A7  *  DABS ( Z  > 

F  -  PI2*D£XP<-T*T/2.  0)-A18*<E-DABS(T) ) 
IF(P1  .  LE.  F)GO  TO  1900 
OO  TO  1300 

IF(U1  .  LE.  AS ICO  TO  1700 

V  -  DR AND ( IX ) 

U  -  DR AND ( IX) 

Z  -  V  -  u 

T  -  A9  ♦  A10  *  DHINKV,  W> 

P2  -  OHAXKV,  U> 

IF(P2  LE.  A12)G0  TO  1000 
P3  -  A13  •  DABS  < l ) 

F  »  PI2  *  DEXP(-T*T/2.  O ) -A 1B*( E-DABS ( T> ) 
1F<P3  . LE.  F)CO  TO  1800 
00  TO  1600 

V  •  BRAND ( IX ) 

U  -  DR AND < IX ) 

Z  -  V  -  W 

T  -  A9  -  A14  *  DHINKV.  U> 

P4  -  DHAXKV.U) 

IF<P4  .LE.  A131G0  TO  1800 
PS  «  A16  •  DABS(Z) 

F  ■  PI2  *  DEXP<-T*T/2.  0>-A18*<E-DABS<T> > 
IF«P3  LE.  F)CO  TO  1800 
GO  TO  1700 

IF( Z  .  LT.  0.  0)C0  TO  1900 

X  -  -T 

GO  TO  2000 

X  -  T 

CONTINUE 

CONTINUE 

U<N)  *  (0)  *  SNOL(X)  *  RATIO 
TYPE  U<N> 

CONTINUE 

CONTINUE 

RETURN 

END 


SUBROUTINE  VOICEDCU. PPF. C,  IPUL. PER  1 .  PER2 > 


C***THE  PURPOSE  OF  THIS  SUBROUTINE  IS  TO  PRODUCE  AN  INPUT  TO  THE 


SYNTHESIS  FILTER 
GLOTTAL  PULSE  SHAPE 


<EACH  TIME  THIS  ROUTINE  IS  CALLED  ONE  PULSE 
ONE  PULSE  IS  PRODUCED) 


INPUTS:  PPF-PITCH  PERIOD  (SIZE  IN  SAMPLES) 

0  -A  REAL  VALUE  FOR  THE  PULSE  MAXIMUM 
IPUL-< 1 -PULSE.  0- IMPULSE ) 

PERI— THE  LENOTH  OF  THE  POS.  SLOPE  PORTION  OF  PULSE 
PER2-THE  LENGTH  OF  THE  NEG.  SLOPE  PORTION  OF  PULSE 


OUTPUT:  U  -THIS  IS  A  GLOTTAL  PULSE  THA 

SUBROUTINE  THROAT 


.E^OMES  THE  INPUT  TO 


NP  NN 


"TP— >■*<— TN— >“* 


-1  PITCH  PERIOD- 


DIMENSION  U< ISO) 

INTEGER  FRM6IZ.PPF 
PI  -  3.  14139 
PI2  «  PI/2.  O 
TIME  -  1.0 
NPOS  -1 

TP  -  PERI  *  FLOAT < PPF ) 

TN  •  PER2  *  FLOAT (PPF) 

C 

C •••CALCULATE  ONE  FRAMES  WORTH  OF  U 
C 

NP  -  INT(TP) 

NN  •  INT(TP  ♦  TN) 

00  400  1*1. PPF 
IFdPUL  .  EQ.  1  >00  TO  100 
IF( I  .  OT.  1)00  TO  300 
U(I>  •  0  •  2.  0 
GO  TO  900 

100  IF( I  OT  NP )G0  TO  200 

U(I>  -  (0)*(1.  0-C0S<TIME*PI/TP>) 
GO  TO  900 

200  IF( I  .  OT.  NN)00  TO  300 

U(I)  •  G*C0S(((TIME-TP)/TN)*PI2) 
GO  TO  900 
300  U(l>  •  0.  0 


300  TIME  *  TIME  ♦  I  0 

X  WRITE ( 12.  900) I.  U< I > 

X900  F0RMAT< IX.  IS.  IX, C16.  7) 
600  CONTINUE 

RETURN 
END 
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PROGRAM' 
AU  THOR 
DATE: 
LANGUAGE : 

FUNCTION: 


SIFT 

WILL  JANSSEN  ANO  CRAIG  MCKOWN 
19  JULY  03  UPDATED  26  AUG  S3 

F0HTRAN5 

THIS  PROGRAMS  SETS  UP  AND  RUNS  THE  SIFT  PITCH  DETECTOR 
AS  DESCRIUEO  BY  MARHEL  AND  GRAY  WITH  A  SILENCE 
DETECTOR  ADDED. 


LOAD  COMMAND  LINE: 


RL DR  PITCHERS  SIFTA 
10F  «FLIBtt 


DIRECT  AUTO  ENER 


OUTPUTS: 


C  RUN  LINE:  SIFT  FILEi/T  FILE2/Q  FILE3/I 

C  INPUTS:  SPEEFL-IS  THE  INPUT  SPEECH  FILE  AND  IS  REPRESENTED 

C  BY  K1LE1  AUUVt 

C  PARAM  -THIS  INPUT  FILE  < OUTPUT  OF  SETUP)  CONTAINING 

C  25  RECORDS  OF  REAL  DATA  THEN  25  RECORDS  OF 

C  INTEGER  DATA.  THIS  DATA  IS  USED  TO  INITIALIZE 

C  THIS  FILE.  DATA  FROM  THIS  FILE  USED  IN  THIS 

C  PHI  Kill  AM: 

C  REC0RDK6-RPARAM < 6 ) IS  THE  SILENCE  THRESHOLD 

C 

C  OUTPUTS:  DUMMY  -THIS  IS  THE  OUTPUT  PITCH  FILE  WHERE  EACH 

C  RECORD  REPRESENTS  THE  PITCH  INFORMATION 

C  FOR  EACH  SO  POINTS.  THE  RECITRD  FORMAT  IS 

C  ON  LINE  049. 

C 

C**************************  •♦•♦■•♦A*#******'*****************''*******'*****'** 

c 

INTEGER  SPEEFLI7).  PARAM17 ) ,  DUMMY <  7  > »  RUMMY ( 7 > 

INTEGER  VALI768).  THPIT<400>,  NEI400).  NSPMAX < 256) 

DIMENSION  SPCH1400) .  PBUF < 100 >  »  P ITCH< 3) •  RPARAMI2S) .  IPARAMI25) 

DIMENSION  PITI-l: 400) 

C 

MAXSET  -  400 
HAXPT  »  BO 
C 

DO  30  1-1. 400 
NE< I )  »  O 
30  CONTINUE 

NFILES  «  3 

CALL  IOFINFILES. MAIN. SPEEFL.  DUMMY.  PARAM.  RUMMY. MS.  Si. 

X  S2,  S3.  S4) 

CALL  OPENd. SPEEFL. 1. IER) 

IF  (IER.  NE.  1)  TYPE  “OPEN  FILE  ERROR  “.IER 
CALL  OPEN <3.  PARAM. 3. IER) 

IF( IER  .  NE.  1 >TYPE"OPEN  FILE  ERROR  PARAM  “.IER 
DO  40  1-1.25 

READ  FREEI3)  RPARAM(I) 

40  CONTINUE 

DO  41  1-1.25 

READ  FREE (3)  IPARAM(I) 

41  CONTINUE 
C 

C***  FIND  MAX  VALUE  OF  i SPEECH 
C 

1810  -  00 

MQLOCK  -  1 
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uV.WW'A'. 


f  *■ 


TTTV 


NV  »  0 
WoOO  a  u 
NSOO  -  O 

1041  CONTINUE 

CALL  RDULKd.NV.  NSPMAX. MDl OCK.  I  ENOS) 

00  1042  J*l»  256 

NSOO  »  lAI)S(NSPMAXlJ) ) 

IF  (NSOO  .  CT.  NSOO)  NSOO  =*  NSOO 

N600  -  N600  ♦  1 

1042  CONTINUE 
NV  «  NV  *■  1 

IF ( (NV.  LT.  1BI0).  AND.  ( I ENDS  NE  9) >00  TO  1041 
CALL  REWIND! 1) 

SSOO  -  32000.  O/FLOAT < NSOO) 

THRESH  »  RH  Aft  Ail  (6) 

PITCH! 1 )**0.  O 
PITCH(2)“0  O 
PITCH(3)=*0.  0 
NSET  *  0 
S  -  1 
K  ■  O 

70  CONTINUE 

IF  (NSET.  OE.  HAXSET)  00  TO  160 

nset  -  nset  *  i 

NPOINT  «  O 

CALL  RDOLK ( 1 . K. VAL.  3.  I ER  > 

IF  < IER.  EQ.  9)  CO  TO  160 

IF  (IER.  NE.  1)  TYPE  “READ  FILE  ERROR  IER 
F  -  S  ♦  399 
00  lOO  J»S.  F 

NP01NT*NP0INT+1 

SPCH(NPOINT)  **  (FLOAT ( UAL (J) )/S04.  S)  »  S200 
100  CONTINUE 

C 

C***  DETERMINE  IF  ENERGY  THRESHOLD  IS  EXCEEDED 
C 

CALL  ENER(SPCH. THRESH.  NEN) 

NE(NSET)  »  NEN 
X  TYPE-NEN-  ",  NEN 

C 

C***CALL  TO  THE  SUBROUTINES  WHICH  PERFORM  PITCH  ANALYSIS 
C 

CALL  SIFTA(SPCH. PITCH) 

DELAY  -  NSET  -  2 
IF  (PITCH(3>. EQ.  O.  0)  CO  TO  110 
P I T ( DELAY )  -  (PITCH! 3)  -  1X4 
CO  TO  ISO 

110  PIT( DELAY)  *=0.0 
120  CONTINUE 

S  -  S  ♦  MAXPT 
IF  (S.  LE.  SS6)  CO  TO  70 
S  -  S  -  2S6 
K  »  K>1 
CO  TO  70 
160  CONTINUE 

MSET-NSET  i TOTAL  •  OF  SETS 

CALL  CLOSE! 1.  IER) 

IF  (IER.  NE.  1)  TYPE  “CLOSE  FILE  error  ",  IER 
DO  180  I  ■  l.MSET  -  2 

TRPIT(I)  *  INT(PIT(I>> 
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CONTINUE 

00  190  I  ■»  MSET  -  1.  MAX  SET 
TRPIT(I)  -  O 
CONTINUE 

CALL  DFILW(DUMMY.  IER) 

IF  < IER.  EQ.  13)  GOTO  200 

IF  (IER.  NE.  I)  TYPE  "DELETE  ERROR  ".IER 

CALL  CFILM(DUMMY) 2.  IER) 

IF  (IER. NE.  1)  TYPE  "CREATE  FILE  ERROR  ".IER 
CALL  OPEN (2.  DUMMY.  3.  IER) 

DO  900  J*l.  HAXSET 
NSIL  »  1 

IF(TRPIT( J>  .EQ.  0>NS1L  «  0 
IF(NE(J)  .EQ.  1)C0  TO  300 
NSIL  -  O 
TRPIT(J)  -  O 

WRITE (2.  849) NSIL.  NE( J)  •  THPIT ( J) 

FORMAT  OX,  3( 1 10. 3X  >  > 

CONTINUE 

IF  (IER  NE.  1)  TYPE  "WRITE  DLOCK  ERROR  IER 

STOP 

END 
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c< 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


SIFT  ALGORITHM  PROCESSING  -  STEP1 

INPUT  PARAMETERS:  SPCH(J)  (J-1.2.  .  .  >400) 

THE  SPEECH  SIGNAL  TO  BE  PROCESSED  FOR  PITCH 

OUTPUT  PARAMETER:  PITCH! J>  (J-1.2, 3) 

THE  PITCH  IN  UNITS  OF  C  ] 

NOTE:  PARAMETERS  FIXED  FOR  FS-S  KHZ 


SUBROUTINE  SIFTA(SPCH.  PITCH) 

DIMENSION  SPCH! 1 )• PBUFI 100).  AF(4),  PF!4)»  DF( 3).  D( 3) .  ABUF <  33) 
DIMENSION  U(IOO).  A!3>.  P( 3>. RC( 3), PITCH! 1 > 

DATA  AF/1.  .  -2.  340366.  2.  01 1900.  -.  614109/ 

DATA  PF/.  0337062. -  0069936.  0069936.  .  0337082/ 

DATA  P/1.  .  4*0.  / 

C***  INITIALIZE  MEMORY  OF  DIRECT  TO  ZERO 
DO  10  J-l.  3 
DF(J)»0. 

D(J>«0 

10  CONTINUE 

C***  PRE-FILTER.  DOWN-SAMPLER.  DIFFERENCES  AND  HAMMING  UINDOUER. 

UPREV-O. 

DO  20  J-l. 400 

CALL  DIRECTIAF.  PF.  3.  DF. SPCH!  J).  SOUT) 

IF  (MOD! J.  4).  NE.  0)  00  TO  20 
K-J/4 

PBUF(K)— SOUT 

U<K)— (SOUT— UPREV)*!.  34-.  46*C0S((K-1.  >*6.  28318/99.  >) 
UPREV-SOUT 
20  CONTINUE 

C***  COMPUTE  INVERSE  FILTER  COEFFICIENTS 

CALL  AUTO! 100. U. 4.  A.  ALP. RC) 


C***  PERFORM  INVERSE  FILTERING  AND  HAMMING  WINDOW 
DO  30  J-l.  100 

CALL  D1RECT<P. A. 4. D. PBUF(J).FOUT) 

IF  ( J.  LE.  4)  GO  TO  30 

PBUF! J— 41-FOUT*!.  34-.  46*C0S( !J-3>*6.  28318/93  )) 
30  CONTINUE 

C*««  SIFT  ALGORITHM  PROCESSING  -  STEP2 
C*** 

C  INPUT  PARAMETER:  PBUF(J)  ( J-l. 2. .  .  .  .  76) 

C  THE  DOWN— 8AMPLED .  FILTERED  ERROR  SIGNAL 

C»«*  PERFORM  AUTOCORRELATION  ON  PITCH  BUFFER 
DO  23  JJ-1.33 
J-JJ-U 
NMJ-76-J 
SUM-0. 

DO  13  1-1. NMJ 
IPJ-H-J 
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SUM*SUM*PDUF( l >*PUUF  UPJ) 

IS  CONTINUE 

abuf< jj>-sum 
2S  CONTINUE 

C***  OBTAIN  PITCH  VALUES  FROM  LAST  THREE  FRAMES 
Pl-PITCH(i) 

P2-PITCH(2> 

P3-PITCHO) 

C***  GET  PEAK  WITHIN  RANGE C 6. 32  J 

L-6 

AHAX-ABUF(L) 

DO  33  J»6» 32 

IF(ABUF(J>.  LE.  AMAX)  60  TO  39 
AHAX»ABUF <  J ) 

L-J 

33  CONTINUE 

C***  TEST  FOR  MAX  EQUAL  ZERO 

IF  (AMAX.  EQ.  0.  >  CO  TO  60 

C***  TEST  FOR  LEFT  HAND  EDGE. 

C***  IF  ABUF(L)  IS  NOT  A  PEAK.  SET  UNVOICED 

IF  (ABUF(L).  LT.  ABUF(L-l) )  CO  TO  60 

C***  PERFORM  PARABOLIC  INTERPOLATION 
C***  ABOUT  LOCATION  L 

AA-ABUF(L-1 ) — ABUF  <  L ) 

AA*(AA+ABUF(L+1 )-ABUF(L) )/2. 

BB-( ABUF(L+I >-ABUF(L-l ) ) /4 
AP-ABUF ( L ) -BB*BB/AA 
AL«L-B8/AA 
V-AP/ABUF( 1 ) 

C»«*  TEST  WITH  VARIABLE  THRESHOLD 

IF  <L.  CE.  19)  00  TO  40 
DO  —  l  *<L-6.  )/13.  *2. 

GO  TO  30 
40  CONTINUE 

DD  ■-!.  *<L— 19.  >/13.  ♦! 

30  CONTINUE 

V-V/DD 

C***  DECISIONS 

IF( V.  OE.  .  37)  GO  TO  70 
IF(PI.  EQ.  0.  )  GO  TO  60 
IF< V.  OE.  .  32)  GO  TO  70 
60  PO-O. 

GO  TO  80 
70  PO*AL 

80  IF(A8S(P1-P3).  LE.  .  373*P3>  P2-(Pl+P3>/2. 

C  IF(PO  AND  PI  ARE  CLOSE)  AND  <P2  NOT  0) 

C  BUT  P3-0.  THEN  USE  LINEAR  EXTRAPOLATION  FOR  P2 

C***  (COMING  OUT  OF  VOICED). 

IF  (P3.  NE.  O.  )  GO  TO  90 

IF(P2.  EQ.  0.  >00  TO  90 

IF  (ABS(PO-Pl).  OT.  0.  2*P1>  GO  TO  90 


P2-<2  *P1)-P0 

TEST  FOR  ISOLATED  “VOICED'AND  INCORRECT  END  OF  “VOICED 
IF  <Pl  NE.  O.  )  GO  TO  100 
IF  <ABS(P2-P3>.  GT.  (.  37S*P3) >  P2-0. 

UPDATE  FRAMES 
PITCH<3)-P2 
PITCH<2>-P1 
PITCH!  U-PO 

TRUE  PITCH  DELAYED  BY  TWO  FRAMES  EQUALS: 

!PITCH!3>-1>*9 

RETURN 
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SUBROUTINE  ENER < SPCH. THRESH. NEN) 


THIS  SUBROUTINE  DETERMINES  WHETHER  A  FRAMES  ENERGY 
EXCEEDS  A  SILENCE  THRESHOLD 

INPUTS:  SPCH  -THIS  IS  ONE  FRAME  OF  SPEECH 

THRESH— THIS  IS  THE  SILENCE  THRESHOLD 

OUTPUTS: NEN  -THIS  IS  A  SILENCE  INDICATOR  WHERE  A  ZERO  IS 
FOR  SILENCE  AND  A  ONE  FOR  SPEECH 

►*•**■**********♦*•*******-*  **■**•***•*■*■**•**************■******•*****#•** 
DIMENSION  SPCH ( 400 > 

NEN  -  1 
SUM  «  0  .  O 
DO  100  J*l»  400 

SI  -  SPCH(J>/100.  0 
SUM  »  SUM  ♦  SI  *  SI 
100  CONTINUE 

IFtSUM  .  LT.  THRESH) NEN  =  O 
RETURN 
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THE  SU3R0UTINE  AUTO  AS  PRESENTED  IN  MARKCL  ?.  GRAY 

THIS  SUBROUTINE  COMPUTES  THE  AU IOC OR RELATION  THEN  INVERTS 

THE  MATRIX  TO  FIND  COEFFICIENTS 


INPUTS: 


-SIZE  OF  THE  AUTOCORRELATION 
-SEQUENCE  BEING  AUTOCORHELA TED 
-ORDER  OF  THE  FILTER  MINUS  ONE 


OUTPUTS:  A  -INVERTED  COEFFICIENTS 

ALPHA  -THE  CURRENT  SYSTEM  OAIN 
RC  -THE  REFLECTION  COEFFICIENTS 

f  «»•»»»»«#♦«»♦♦«««»»»»»«#« 

SUBROUTINE  AUTOIN.  X.  M.  A.  ALPHA. RC) 

DIMENSION  XII). All). RC (I) 

DIMENSION  RISl ) 

C 

rtP-M+1 

DO  13  K-l.MP 
R<K)»0. 

NA-N-A  H 
DO  lO  NP-i.NK 

RIK)-R(K)aXINP>*XINP«-K-1) 

10  CONTINUE 

IS  CONTINUE 

RCI 1 >— — RI2J/RI 1 ) 

AID-1 
A12)«RC1 1 ) 

ALPHA-R 1 1  >  +R 1 2  ) *RC 1 1 ) 

DO  40  MINC-2.M 

t»*0 

DO  20  IP-1.  MINC 

S-O+R (MINC-IP*2 )  »A(  )  P  > 

20  CONTINUE 

RCIHINC)  —  S/ALPHA 
MM-MINC/it+l 
DO  30  IP— 2.  MM 

IU-MINC-II**;» 

AT-A1 IP  > ARC IMINC ) *A! IB  > 

A! IB) -A I  IB) *RC (MINC )*A( IP) 
AUP1-AT 

30  CONTINUE 

AIHINC+1 )— RC (MINC) 

ALPHA-ALP HAaRC (MINC ) *S 
IF  I  ALPHA)  30.30.40 
40  CONTINUE 
30  RETURN 
END 
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SUBROUTINE  DIRECT <  A.  P.  M.  D.  X'IN.  XOUT  ) 


THIS  ROUTINE  IMPLEMENTS  THE  DIRECT  FORM  FILTER 

INPUTS;  A  -THIS  IS  THE  SET  OF  FILTER  COEFFICIENTS 
FOR  THE  POI.ES 

P  -THIS  IS  THE  SET  OF  FILTER  COEFFICIENTS 
FUR  THE  ZEROS 

M  -THIS  IS  THE  ORDER  OF  THE  FILTER  MINUS  ONE 

0  -THIS  IS  THE  SET  OF  INTERMEDIATE  NODAL  VALUES 

XIN  -INPUT  TO  THE  FILTER 

OUTPUTS: XOUT  -THIS  IS  THE  FILTER  OUTPUT 

DIMENSION  A(  1 ><  P( 1 )•  D( 1 ) 

XOUT  -  0  0 
D< 1 )  »  XIN 
DO  10  J  -  1.  M 

I  »  M  *•  1  -  *» 

XOUT  *  XOUT  ♦  D< I*1 )*P( J*1 > 

0<1>  *  D<  1  )  -  A<  I*1  >*D<  1«-1  > 

D< 1*1 )  »  D< I ) 

10  CONTINUE 

XOUT  -  XOUT  ♦  D< 1 > *P ( 1 ) 

RETURN 

END 


w 


^  7.7 


V*  ' 


& 


I 

I 


»v 

LV 

v 

K*"1 

tv 

w>2 


fif: 

& 


ra 

si 

&• 


IV  ■ 


<» 


$  # 


c* 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


PROGRAM  1*1  ICIli* 

AUTHOR  WILL  JAMUSl  N 

DATE  22  APRIL  O.i 

LANGUAGE:  EUR TRANS 

FUNCTION:  THIS  PROGRAM  INPUTS  A  SPEECH  FILE(NO  PARTICULAR 

LENGTH  REQUIRED)  AND  PRODUCES  A  FILE  OF  FRrtMF 
INFOP  1ATI0N  CONTAINING  A  VOICED/UNVOICED  DECISION 
AND  A  SILENCE  DECISION  FOR  EACH  FRAME  THE  BASIC  ALGORITHM  OF'  THIS 
PITCH  DETECTOR  COMES  FROM  J  J  DUBNOWSKI, R.  W.  SCHAFER.  AND  L.  R. 
RASINCR. "REAL-TIME  DIGITAL  HARDWARE  PITCH  DETECTOR.  “IEEE  TRAN. 
ACOUST.  . SPEECH, AND  SIGNAL  PROC.  . VOL.  ASSP-24.  NO.  1, FEB  76. 


LOAD  COMMAND  LINE: 


RL DR  PITCH  I OF  LPF  ENERGY  SETMAX 
CLPPER  AUTCOH  CFLIBtt 


RUN  LINE  PITCH2  FILE1/I  FII.E2/0  FILE3/I  FILE4/I 
WHERE: 

FILE1-IS  THE  INPUT  BINARY  SPEECH  FILE 
FILE2-IS  THE  PITCH  DETECTOR  OUTPUT  INCLUDING 
A  VOICED/UNVOICED  INDICATOR.  A 
SILENCE/SPEECH  INDICATOR.  AND  A  PITCH 
PERIOD.  SEE  LINE  849  FOR  THE  OUTPUT  FORMAT 
FILE3-CONTAJNS  A  SET  OF  INITIALIZATION  PARAMETERS 
FILE4-C0HTAINS  A  SET  OF  FILTER  COEFFICIENTS  FOR 
THE  t.OW  PASS  FILTER 

WHAT  FOLLOWS  IS  A  SHORT  DESCRIPTION  OF  SOME  OF  THE  VARIABLES 
USED  IN  THIS  PROGRAM: 

PARAMETER  LIST- (WHERE  A  SET  IS  DEFINED  AS  1/3  OF  A  FRAME) 

NUMSEO—  <  I )  NUMBER  OF  SEOMENTSd/3  FRAME)  PROCESSED  BY  PITCH 
FILSPI-(IS)  FILE  NAME  OF  INPUT  SPEECH  FILE 
OUTFRM— ( IS)  FILE  NAME  OF  OUTPUT  FRAME  INFORMATION 
NSETS-(I)  CURRENT  SET  NUMBER 

MXSETS— < I )  MAXIMUM  NUMBER  OF  SETS( DETERMINES  SPEECH  ARRAY  SIZE) 
NPOINT-(I)  CURRENT  POINT  NUMBER  IN  CURRENT  SET 
MAXPNT-(I)  MAXIMUM  NUMBER  OF  POINTS  PER  ARRAY  FILL (SO  BLOCKS) 
SPEECH(A, B)-(R)  THE  ARRAY  OF  SPEECH  DATA  WHERE  A  IS  THE  SET 
NUMBER  AND  B  IS  THE  POINT  IN  THE  SET 
LSTSET— ( I )  IN  THE  CASE  WHERE  THERE  IS  LESS  THAN  25  BLOCKS 
READ  IN  LSTSET  IS  THE  LAST  SET  USED 
ZEROSP-(I)  SET  TO  ONE  IF  ZEROING  WAS  REQUIRED  FOR  ANY  SET 
TFRMAR-(R)  THRESHOLD  FRAHE  ARRAY  CONTAINS  MAX.  VALUE  IN  EACH  SET 
THEN  SETS  THRESHOLD  IN  EACH  FRAME 
PNTSET-(I)  NUMBER  OF  POINTS  IN  EACH  SET 

LSTRNO-(I)  LASTROUND.  WHEN  COMPLETED  THE  PITCH  DETECTOR  IS  DONE 
RPARAM-(R)  REAL  RUN  PARAMETERS  FROM  PARAMETER  FILE 
IPARAM— ( I )  INTEGER  RUN  PARAMETERS  FROM  PARAMETER  FILE 
LSTFRM-(I)  LAST  FRAME  IS  2  LESS  THAN  THE  NUHBER  OF  FRAHES  BEING 
WORKED  ON 

AUTTHR-(R)  THRESHOLD  FOR  DETECTING  2ND  AUTOCORRELATION  PEAK 
TMAUT-(R)  THIS  ARRAY  HOLDS  THE  AUTOCORRELATION  FUNCTION  FOR  UNE  FRAMI 
SPCH— <R)  THIS  ARRAY  HOLDS  ONE  FRAME  OF  SPEECH  INFORMATION  FOR 
AUTOCORRELATION  COMPUTATIONS 
P08-( I )  LOCATION  OF  AUTOCORRELATION  PEAK 
I8PECH-(I)THE  INTEGER  VAR  THE  SPEECH  FILE  IS  READ  INTO 
FRAMAR-U) ARRAY  OF  SILENCE.  VOICED/UNVOICED  DECISIONS,  !<  PITCH 
HAX-(R)  VALUE  OF  AUTOCORRELATION  PEAK 
OVLBET- < I )  THE  SET  STARTING  LOCATION 
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DIMENSION  SPEECH! 10. 80).  rFRMAR < 10 > . FILSP l ( 7 > ,  OUTFRM! 7 > . 

X  PARAM! 7  )  /  RPARAM! 25) .  I  PARAMOS  t ,  FRAMAR  (  l  0,  3  >  .  FLTRFL  (  7 ) . 

X  LPFSPF ( 7  > » ZR0C0F ( 100), SI (2 ) ,  52(2 ) .  S3(2 / ,  34 ( 2 > .  MAIN! 7 > . 

X  OLDSPC < 120  > » TEMPSP (4, SO  > 

INTEGER  FILSPI.  OUTFRIi,  NUMSEC.  NSETS.  MXSETS.  NPOINT,  MAXPNT,  LSTOET. 
X  PNTSET.  PAR AM, BLOCHS.  BESSET,  TLSTST,  FRAMAR.  ZEROSP.  LSTRND, 

X  FLTRFL. SI.  S2. S3.  S4,  OVLSET.  FOURPS.  FILRND.  NSPMAX ( 256 ) . 

X  NOEL (3.  3) 

REAL  SPEECH. RPARAM. LPFSPF 

NOMSEC  -  10 

MAXPNT  -  800 

MXSETS  »  10 

NFC  -  0 

DO  1001  J10— 1.  3 
DO  1000  JU-1. 3 
NOEL  (J10.  Oil)  =*  0 

1000  CONTINUE 

1001  CONTINUE 

C 

C***INITILIZES  ARRAY  THAT  STORES  OLD  SPEECH  FOR  WINDOW  INFORMATION 
C 

DO  8  K5-1. 120 

OLDSPC <K9>  ■  00 
8  CONTINUE 

C 

C***  READ  INPUT  AND  OUTPUT  FILES  FOR  DATA*** 

C 

NFILES  -  4 

CALL  IQFINFILES. MAIN.  FILSPI.  OUTFRM. PARAM.  FLTRFL.  MS. SI. 

X  S2.  S3.S4) 

CALL  OPEN! 1. FILSPI.  1. IER) 

IF ! I£R  .  NE.  DTYPE-OPEN  ERROR  ON  FILSPI  <  1  >.  IER 
CALL  OPEN (2. OUTFRM. 3.  IER) 

CALL  OPEN (3.  PARAM.  1.  IER) 

IF  (IER  .  NE.  1 ) TYPE “OPEN  ERROR  ON  -. PARAM < 1 > . IER 
DO  10  1-1.29 

READ  FREE<3)  RPARAM < I ) 

10  CONTINUE 

DO  11  1-1.29 

READ  FREE<3)  IPARAM(I) 

11  CONTINUE 

CALL  CLOSE <3.  IER) 

IF  (IER  .  NE.  1 ) TYPE-CLOSE  FILE  ". PARAM! 1 ).  IER 
CALL  OPEN< 4.  FLTRFL.  1. IER) 

X  TYPE-PAST  OPEN  ON  FLTRFL" 

IF  ( IER.  NE.  1 )  TYPE "OPEN  ERROR  ON  ". FLTRFL ( 1 ).  IER 
C 

C***R£AO  IN  NO.  OF  FILTER  P0INTS(MAX-120  POINTS)WHERE  NFLTRZ-O'S. 

C  NFPOLE-POLE'S 

READ  FRE£<4 >  NFLTRZ 
X  TYPE -NFLTRZ- -.NFLTRZ 

READ  FREE (4)  NFPOLE 
DO  13  1-1. NFLTRZ 

READ  FREE14)  ZROCOF(I) 

13  CONTINUE 

14  CONTINUE 

CALL  CLOSE (4.  IER) 

IFUER  .  NE.  1 ) TYPE-CLOSE  FILE  ERROR  FLTRFL ( 1 ).  IER 
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C 

C***FINO  MAX  VALUE  OF  SPEECH 
C 

IB  IC  «  88 
M8L0CK  «  1 
NV  *  O 
N600  -  O 
NSOO  ■  0 

29  CONTINUE 

CALL  RDBLK( 1. NV. NSPMAX,  MBLOCK.  TENDS) 

00  30  ».3M 

NSOO  -  TABS (NSPMAX ( J) > 

IF (NSOO  .  QT.  NSOO) NSOO  »  NSOO 
N600  »  N600  ♦  1 

30  CONTINUE 
NV  ■  NV  ♦  1 

IF (  (NV.  LT.  1BI0).  AND.  UENDS.  NE.  9>)C0  TO  29 
X  TYPE “THE  FOLLOW I NO  NO.  OF  POINTS  WHERE  CHECKED  “.N600 

X  TYPE “AND  THE  MAX.  VALUE  FOUND  WAS  “.NSOO 

CALL  REWIND! 1 ) 

S200  -  32000.  0 /FLOAT (N500) 

C 

C*** INPUT  10  BLOCK  SEGMENT- IF  NOT  POSSIBLE  ZERO  FILL  WHATS  LEFT  OF 
THE  LAST  SET*** 

M100  a  0 
FILRND  a  O 
OVLSET  a  x 
BLOCKS  -  O 
ZEROSP  a  o 
CLP LEV  a  RPAHAM(l) 

AUTTHR  a  RPARAH(S) 

MXSTPT  >  IPARAMO)  »  USUALLY  80 

VOCTHR  -  RP ARAMS 3) 

SlLTHR  a  RPARAM(IO) 

FQURPS  a  4  *  HXSTPT  .FOUR  SETS 
19  N  -  1 

***  LOADS  ZEROS  FOR  LAST  RUN 

IF<FILRND  .  EQ.  0)00  TO  44 
DO  41  KN-9.8 

DO  40  KN2-1, P NT SET 
SPEECH (KN.  KNS)  -0.0 

40  CONTINUE 

41  CONTINUE 
LSTSET  -  B 
LSTRND  -  1 
00  TO  79 

44  CONTINUE 

P NT SET  a  MXSTPT 

NSETS  a  OVLSET  i 1  FIRST  TIME.  ELSE  9 
NPOINT  a  1 
LSTSET  -  MXSETS 
POINTS  ■  MXSETS  *  MXSTPT 
C 

C***  CHECK  FOR  LAST  SET  IN  CURRENT  SERIES 
C 

49  IF  (NSETS  .  OT.  MXSETS)  00  TO  79 

99  IF  (NPOINT  .LT.  (MXSTPT  ♦  1>)  00  TO  69 
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C***  COUNTS  THROUGH  SETS 

c 

NROINT  *  1 
NSETS  -  NSETS  ♦  1 
SO  TO  43 
C 

C***  READS  IN  A  RAW  SPEECH  FILE 
C 

43  READ  BINARY! 1.  END-70)  ISPECH  i  READ  RAW  SPEECH 

moo  -  moo  ♦  i 

SPEECH ( NSETS,  NPOINT)  -  FLOAT ( I SPEC H)*S200 
68  CONTINUE 

NPOINT  -  NPOINT  ♦  1 
00  TO  33 

70  LSTSET  -  NSETS 

X  TYPE-LSTSET  -  -.LSTSET 

ZEROSP  -  1 
F1LRND  -  1 

DO  71  I -NPOINT. HXSTPT  j FILLS  LAST  SET  WITH  ZEROS 
SPEECH  INSETS.  I )  -0.0 

71  CONTINUE 
C 

C***  ZERO  FILL  TILL  LAST  FRAME 
C 

IF (LSTSET  .  LE.  6JLSTRND  -  1 
FILRND  -  1 

IF (LSTSET  .Ed.  10)00  TO  73 
LSTSET  -  LSTSET  ♦  1 
DO  73  N30-LSTSET. 10 
DO  72  N60— 1,  PNTSET 

SPEECH  (N30.N40)  -0.0 

72  CONTINUE 

73  CONTINUE 

73  CONTINUE 

X  ACCEPT-LPF  71-YES.  O-NO  “.  1YS 

X  IF < IYS  .  EQ  0)00  TO  9000 

CALL  LPF( SPEECH.  LSTSET.  ZROCOF. NFLTRZ.  PNTSET. OLDSPC. 

X  TEMPSP.  OVLSET ) 

C  ACCEPT-  ENERGY?  1-Y.  O-N", IYS 

C  IF ( IYS  .EQ.  0)00  TO  9000 

CALL  ENEROYt SPEECH.  LSTSET.  FRAMAR, SILTHR, PNTSET. VOCTHR ) 
TLSTST  -  LSTSET  -  2  i CALCULATE  EXCEPT  LAST  2  SETS 
BEOSET  -  1 

X  ACCEPT-DO  SETMAX?  1-Y. 0-N-.  IYS 

X  IF( IYS  .EQ.  0)00  TO  9000 

CALL  SETMAX (SPEECH.  LSTSET.  TFRMAR.  PNTSET. RPARAMI 1 ) . BEOSET) 
X  ACCEPT-  DO  CLPPER?  1-Y.  O-N*.  IYS 

X  IF( IYS  .  EQ.  0)00  TO  9000 

CALL  CLPPER (SPEECH. LSTSET.  TFRMAR.  PNTSET, BEOSET) 

X  ACCEPT-  DO  AUTCOR?  1-Y. O-N". IYS 

X  IF( IYS  .EQ.  0)00  TO  9000 

CALL  AUTCOR (SPEECH. LSTSET, AUTTHR,  PNTSET.  FRAMAR. BEOSET) 
C***THE  PURPOSE  OF  THIS  CODE  IS  TO  ACCOUNT  FOR  THE  2  SET  OVERLAP 
C  PROBLEM  BETWEEN  BLOCKS  READ  IN*** 

C  WRITES  LAST  TWO  SETS  INTO  NEXT  FIRST  TWO  SETS 

IF  (L8TRN0  .EQ.  1)00  TO  800 
DO  800  J-1.4 

DO  790  K-i. PNTSET 

SPEECH (U.  K)  -  TEMPSP (J.K) 


790  CONTINUE 

800  CONTINUE 

QVLSET  -  5 
BLOCKS  -  BLOCKS  +  1 

X  TYPE- JUST  RAN  640  SAMPLES  OP  SPEECH.  SEQ.  NO.  :  “.BLOCKS 

NSTTMP  -  LSTSET  -  4 
DO  860  N3-1.  NSTTMP 
NPO  -  FRAMAR(N3. 3) 

NFC  -  NFC  +1 
NP1  -  NOEL (1.3) 

NP2  -  NDEL (2.  3) 

NP3  -  NOEL (3.  3) 

803  IF(ABS<NP1-NP3) .  LE.  . 373*NP3)NP2»(NP1+NP3> /2.  0 

C  IF < NPO  AND  NP1  ARE  CLOSE)  AND  (NP2  NOT  0) 

C  BUT  NP3»0.  THEN  USE  LINEAR  EXTRAPOLATION  OF  NP2 

C  (COMING  OUT  OF  VOICED) 

IF(NP3. N£.  0)00  TO  809 
IF<NP2.  EQ.  0)00  TO  809 
IF(ABS(NP0-NP1 ).  GT.  0.  2*NP1 >G0  TO  809 
NP2  »  <2.  0*NP  1  )  -NPO 

C***  TEST  FOR  ISOLATED  “VOICED"  AND  INCORRECT  END  OF  "VOICED 

809  IF(NPI.  NE.  0)G0  TO  810 

IF(ABS(NP2-NP3)  CT.  (.  373*NP3)  )NP2=*0 

810  NOEL (3. 3)  »  NP2 
NOEL <3.  2)  «  NDEL (2.2) 

NOEL (3.1)  ■  NDEL (2.1) 

NDEL ( 2. 3 )  -  NP1 

NDEL  (2.  2)  *  NDEL  (1.2) 

NDEL ( 2. I)  -  NDEL ( 1 . 1) 

NDEL  (1.3)  -  NPO 

NDEL (1.2)  -  FRAMAR ( N3.  2 ) 

NDEL (1.1)  -  FRAMAR ( N3. 1 ) 

IF  (NFC.  LT.  3)00  TO  850 

MRITE(2. 849 ) NDEL ( 3.  I ) .  NDEL ( 3.  2) .  NDEL(3.  3) 

849  FORMAT ( 3X» 3(1 10.  3X ) ) 

830  CONTINUE 

860  CONTINUE 

X  TYPE“N100«  -.MlOO 

IF  ( LSTRND  EQ.  1)00  TO  9000 

CO  TO  13 
9000  CONTINUE 

WRITE (2. 849) NDEL (2.  1 >. N0EL(2.  2) .  NDEL ( 2.  3) 

WRITE! 2. 849) NDEL ( 1.  1 ) . NDEL ( 1 . 2) . NDEL( 1 . 3) 

CALL  CLOSE ( 1.  IER ) 

IF  (IER  .  NE.  1)  TYPE-CLOSE  FILE  ERR0R1". IER 
CALL  CLOSE (2,  IER) 

IF  (IER  .  NE.  1)  TYPE-CLOSE  FILE  ERR0R2-.  IER 
TYPE  “THE  PROGRAM  HAS  NOW  CDMPLETED- 
STOP 
END 


i 
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SUBROUTINE  LPF (  SPEECH.  LSTSET .  ZilQCOF.  NFuTRZ.  PNTSET. 

X  OLDSPC.  TCHI'S)*.  OVLSE  i  ) 

C *•••••••••••••••••••••••••••••••••••',»«••••••»•»•» ••••«••••••••••••• 

C  THIS  SUBROUTINE  IS  A  LOW  PASS  FILTER (MAX  100  POINTS)  WITH 
C  A  CUT  OFF  FREQUENCY  OF  900HZ.  THE  TYPE  OF  WINDOW  CHOSEN  FOR 

C  THE  FILTER  IS  USER  DEPENDENT.  THIS  ROUTINE  IS  PASSED  IN  A  FILE  OF 

C  FILTER  C0EFIC1ENTS  WHERE  THE  FILE  IS  AN  OUTPUT  OF  KATHY  WARD'S 

C  WFILTER.  THE  FIRST  NUMBER  READ  IS  THE  NUMBER  OF  ZERO  C0EF1C1ENTS 

C  (USUALLY  99). THE  SECOND  IS  THE  NUMBER  OF  POLE  COEF ICIENTS! ALWAYS 

C  ZERO.  HOWEVER  THE  PROQRAM  COULD  BE  UPDATED  TO  ACCEPT  IIR  FILTERS) 

C  AND  THE  NEXT  VALUES < USUALLY  99)  ARE  THE  ZERO  C0CFIC1ENTS 
C  B<0>  TO  B < 99 ) .  THE  FORMULA  USED  FOR  THE  FILTER  IS  Y(N>- 

C  SUM (FROM  K-0  TO  M)  OF  B(K)*X(N-K>  WHERE  M  IS  (HE  NO.  OF  FILTER 

C  POINTS. 

C 

C  INPUTS:' SPEECH-TH IS  IS  THE  INPUT  SPEECH  FILE  (ALSO  THE  OUTPUT) 

C  LSTSET-THE  NUMBER  OF  THE  LAST  SET  TO  BE  PROCESSED 

C  ZROCOF-THIS  IS  A  SET  OF  FILTER  COEFFICIENTS 

C  NFLTRZ-THIS  IS  THE  NUMBER  OF  ZERO  COEFFICIENTS 

C  PNTSET-THIS  IS  THE  NUMBER  OF  POINTS  IN  1/3  OF  A  FRAME 

C  OLDSPC -THIS  IS  OLD  SAVED  SPEECH 

C  TEMPSP-THIS  TEMPORARY  ONE  DIMENSIONAL  ARRAY  HOLDS  SPEECH 

C  FOR  PROCESS INC 

C  OVLSET- I NO I CATES  SET  PROCESSING  STARTING  POINT 

C 

C  OUTPUTS: SPEECH-TH IS  IS  THE  OUTPUT  SPEECH  FILE  (ALSO  THE  INPUT) 

C 

C ******** **************»*****»*»****«v********«.*********-********** 

DIMENSION  ZROCOF ( 100) .  SPEECH (LSTSET, PNTSET). 

X  SPHLPF! -119  BOO ). OLDSPC ( 120).  U( 500).  T ( 500) .  TEMPSP (4.  PNTSET) 
INTEGER  PNTSET.  OVLSET 

C***WRITE  SPEECH  FILE  INTO  A  TEMPORARY  FILE  SPHLPF-SPEECH  LPF 
C*** 

C***  PUT  120  VALUES  FROM  LAST  RUN  INTO  SPHLPF  FOR  THE  WINDOW  TO  USE 
C*** 

DO  20  N3=»l. 120 
N4  -  N3  -  120 
SPHLPF ( N4 )  -  OLDSPC ( N3  > 

C  WRITE (12. 29 ) SPHLPF ( N4 >.  N4 

C29  FORMAT ( 3X,  'SPHLPF*  '.  F10.  2.  DX.  'N4-  ',110) 

20  CONTINUE 

Jl  -  O 

0 

C ••'•EXCEPT  FIRST  TIME  UNFILTERED  SPEECH  IS  LOADED  INTO  SPHLPF 

C 

J2  -  0 

IF(OVLSET  .  EQ  DCO  TO  30 
DO  26  M7-1.4 

DO  26  M8=l, PNTSET 
J2  -  J2  *  1 

SPHLPF (J2)  »  TEMPSP (M7,  NS) 

26  CONTINUE 

28  CONTINUE 

Jl  -  J2  <  SETS  NEW  STARTING  POINT  FOR  SPHLPF 
30  DO  40  I -OVLSET.  liSTSET 

DO  39  J-l. PNTSET 
Jl  «  Jl  M 

SPHLPF! Jl)  •  SPEECH! I. J) 

C  WRITE( 12. 29>SPHLPF( Jl ).  Jl 

39  CONTINUE 


40  CONTINUE 

C 

C***PUT  LAST  120  VALUES  OF  SPHLPF  INTO  GLDSPC  TO  PE  READY  FOR  NEXT  RUN 
C 

DO  43  N6-M.  120 

N7  »  J1  -  120  -  PNT SET  *  4 

IF  <N7  LE  0)60  TO  47  • NO  SAVING  REQUIRED. LAST  RUN 

N8  *  N7  «-  N6 

OLDSPC ( N6  >  *  SPHLPF ( NS ) 

43  CONTINUE 

47  CONTINUE 

C 

C***  SAVES  NON  LPF  SPEECH  IN  A  TEMPORARY  ARRAY  FOR  NEXT  LPF  RUN 
C 

IFILSTSET  .  LT  10TC0  TO  70 
LTEMP  »  LSTSET  -  4 
DO  78  Ml*l.  4 

M3  *  LTEMP  «•  Ml 
DO  77  M2-I.PNTSET 

TEMPSPCM1.  M2)  -  SPEECH! M3. M2) 

77  CONTINUE 

78  CONTINUE 
C 

C*+*  SET  PLOT  VARIABLES 
C 

NGPONT  =«•  300 
YMAX  *  3000 
YHIN  *  -3000 

NO  *  1 
MODE  •  O 
1FSLC  -  O 

C«*«  SEND  SPEECH  THROUGH  THE  FILTER 
NSET  -  1 
NSTPNT  a  1 
DO  60  N-l.JI 
SUM  -  O  O 

C***  MULTIPLY  SEQUENCE  AND  FILTER  A I  DELAYS 
DO  SO  K»l. NFLTRZ 
NKOIKF  -  N  -  K 

SUM  -  ZROCOF(K)  *  SPHLPF ( NKDIFF )  ♦  SUM 
30  CONTINUE 

32  IF  (NSTPNT  .  LT.  (PNTSET+1 > )  CO  TO  S3 
NSET  »  NSET  ♦  1 

NSTPNT  »  l 

33  CONTINUE 

C***PUT  FILTERED  SPEECH  BACK  INTO  SPEECH  FILE 
SPEECH (NSET. NSTPNT)  *  SUM 
C  WRITE! 12. 59>8UH,  N 

C39  FORNATOX.  'SPEECH*  '.FIO.  2.  3X.  'N*'.  110) 

NSTPNT  -  NSTPNT  ♦  1 
X  IF<N  OT.  300)00  TO  60 

X  T(N)  -  SUM 

60  CONTINUE 

X  CALL  GRPH2 ( "LPF”.  NO.  T.  U.  NGPONT.  MODE.  YMIN.  YMAX. IFSLC) 

RETURN 

END 
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SUBROUTINE  ENERGY < SPEECH,  LSTSET.  FRAMAR.  SILTHR.  PNTSET.  VOC1HR  ) 

C ******************** 

C  THIS  ROUTINE  USES  A  RECTANGULAR  WINDOW  120  SAMPLES  LONG  THIS  WAS 
C  CHOSEN  AS  A  COMPROMISE  TO  REFLECT  THE  CHANGING  PROPERTIES  OF  THE 
C  SPEECH  SIGNAL  YET  NOT  BE  LESS  THAN  A  PITCH  PERIOD. 

C 

C  INPUTS:  SPEECH-THIS  IS  THE  INPUT  SPEECH  FILE 
C  LSTSET-THIS  IS  THE  LAST  SET  TO  BE  PROCESSED 

C  PNTSET -THIS  IS  HOW  MANY  POINTS  IN  1/3  OF  A  FRAME 

C  VOCTHR— THIS  IS  THE  VOICED/UNVOICED  THRESHOLD 

C  SILTHN-THIS  IS  THE  SILENCE  THRESHOLD 

C 

C  OUTPUTS  FRAMAR-THIS  IS  A  2-D  ARRAY  THAT  CONTAINS  PITCH  INFORMATION 
C  WHERE  THIS  SUBROUTINE  UPDATES  THE  SILENCE  AND  VOICED 

C  INDICATOR 

C 

c *«*< 

DIMENSION  SPEECH! LSTSET , PNTSET ),  SQRD( 240) . 

X  SUM <240), FRAMAR! LSTSET,  3).  TEMP < 500) ,  U< 500 > 

INTEGER  FRAMAfl,  CURHNT.  '.AST,  TOTL,  FIRST,  LSTSET. 

X  TWCPST. PNTSET 

LSTENC  -  LSTSET  -  4 
FIRST  »  O 
C 

C***INITIALIZE  FRAMAR 
C 

X  M6  *  O 

DO  10  Ml-1. LSTSET 
DO  9  M2-1.3 

FRAMAR (Ml.  M2)  »  O 
9  CONTINUE 

10  CONTINUE 

TWCPST  -  2  *  PNTSET 
DO  lOO  I -l, LSTENC 

IF  <1  .QT.  1)  FIRST  =»  1 
C 

C***FILL  THE  FIRST  TWO  SETS  AND  PUT  THEM  IN  SQRD  A  TEMPORY  ARRAY 
C 

IF  (FIRST  .  EG  1)  GO  TO  2S 
DO  20  K-l. PNTSET 

SORD(H)  -  (SPEECH! I.  K>/ 1000.  0>**2  ,,  NORMALIZE  SPEECH 

HI  *  H  ♦  PNTSET 

SQRD! HI )  -  (SPEECH! 1*1,  K >/ 1000.  0)**2 
H2  »  H  *  TWCPST 

SORO(H2)  »  < SPEECH! 1*2,  H ) / 1000.  0 ) **2 
20  CONTINUE 

OO  TO  35 

C***PUT  LAST  160  SQUARED  VALUES  IN  PLACE  OF  FIRST  160*** 

23  DO  30  L-l.  TWCPST 

SQRD(L)  *  SQRD(L*PNTSET ) 

30  CONTINUE 

C***LOAD  NEW  3RD  SET  »»♦ 

DO  33  Ll-1. PNTSET 

SGRD(L1*1 WCPST )  -  (SPEECH! 1*2. LI )/10O0  0)»*2 
33  CONTINUE  , 

33  CONTINUE 

C 

C***TOTAL  UP  SQUARES  IN  119  POINT  WINDOWS (COMPUTE  NEW  WINDOW  ON  1  POINT 
C  SHIFTS) 
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DO  60  CUHRnT-1.  PNfSET 
LAST  “  CURRNT  *•  IL’O 
TOTL  -  0 

DO  59  N-CURRNT. LAST 
TOTL  «  TOTL  ♦  0<iRD<N> 

99  CONTINUE 
C 

C 

C***CREATE  TEMPORARY  ARRAY  FOR  PLOTTING 

C 

X  IF<M6  .  OT.  900)60  TO  97  i  ARRAY  ONLY  HOLDS  900  VALUES 

X  TEMP ( M6 )  «  TOTL 

X  H6  ■  M6  ♦  1 

X97  CONTINUE 

C*** SEE  IF  EXCEEDS  SILENCE  THRESHOLD*** 

IF  <TOTL  .  LT.  SllTHR)  GO  TO  58 
FRAMAR ( I . 2)  »  1 

C 

C***S£E  IF  EXCEEDS  VOICED  THRESHOLD 
C 

98  IF(TOTL.  LT.  VOCTHR)CO  TO  60 

FRAMAR ( I . 1)  »  1 

GO  TO  90 

60  CONTINUE 

90  CONTINUE 

X  TYP£"FRAMAH(I.  1)«  ".FRAMARd.  1).  I 

100  CONTINUE 
C 

C***  PLOT  ENERGY 

C 

X  NO  -  1 

X  MODE  »  O 

X  IFSLC  -  0 

X  H6  *  H6  —  I 

X  CALL  ORPHa< “ENERGY".  NG. TEMP.  U. M6. MODE.  YMIN.  YMAX, IFSLC ) 

RETURN 
END 
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SUBROUTINE  SETMAX ( SPEECH. LSTSET, TFRMAR.  PNTSE1 .  CLPLEV,  BEGSE  f  > 


THIS  SUBROUTINE  CALCULATES  THL  MAXIMUM  VALUE  IN  EACH  SET 

THEN  CALCULATES  THE  FRAME  THRESHOLD  FOR  CLIPPINU  BY  THE  FOLLOWING 

FORMULA  CLIPLEVEL  -  K-MIN! PEAK  * , PEAK2 >  WHERE  PEAK1  AND 

PEAH2  ARE  THE  PEAK  VALUES  IN  THE  FIRST  THIRD  AND  LAST  THIRD  OF  A  FRAME 

INPUTS:  SPEECH— THIS  IS  THE  INPUT  SPEECH  FILE 

LST SET -THIS  IS  THE  LAST  SET  TO  BE  PROCESSED 
PNTSET-TH1S  IS  HOW  MANY  POINTS  ARE  IN  1/3  OF  A  FRAME 
CLPLEV-THIS  IS  THE  CLIPP1N0  LEVEL  IN  PERCENT  OF  MAX 
BECSET-THIS  IS  THE  FIRST  SET  TO  BE  PROCESSED 

OUTPUT  TFRMAR -THE  ACTUAL  VALUE  THE  SUBROUTINE  CLPPER  WILL  USE  FOR 
CLIPPING  THIS  FRAME  OF  SPEECH 


—————————— 

DIMENSION  SPEECH! LSTSET.  PNTSET), TFRMAR I LSTSET) 

INTEGER  PNTSET, BECSET 
REAL  MAXVAL. CLPPCR 

C 

C  —  FIND  MAXIMUM  VALUE  IN  EACH  SET 
C 

DO  70  NSET-BEGSET,  LSTSET 
TFRMAR  (NSE  I  )  -0.0 
DO  60  N-l.  PNTSET 

IF  ( SPEECH! NSET.N1GT  TFRMAR ! NSET )> TFRMAR INSET > -SPEECH 1 NSET. N ) 
60  CONTINUE 

70  CONTINUE 

C— -CALCULATE  CLIPPING  LEVEL  IN  EACH  FRAME  EXCEPT  THE  LAST  TWO 
LSTFRM  -  LSTSET  -  2 
DO  80  1-DEGSET. LSTKRM 
C  CHOOSE  THE  SMALLER  ONE  UNLESS  ZERO 

IF! TFRMAR!  1-2)  .  I.T.  10)«0  TO  79 

IF 1 TFRMAR ! I ) .  QT.  TFRMAR < I *2  >  > TFRMAR 1 1 )  -  TFRMAR ! I -2  > 

C— SET  CLIPPING  LEVEL 

79  TFRMAR! I)  -  CLPLEV  •  TFRMAR! I) 

X  TYPE“TFRMAR<  I  )-**.  TFRMAR!  I  >»  I 

SO  CONTINUE 

HETURN 
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SUBROUTINE  CLPPER (SPEECH.  USISET.  TFRHAR,  PN1SEI.  BLCSET ) 


THE  PURPOSE  OF  THIS  SUBROUTINE  IS  TO  DO  CENTER  AND  PEAK  CLIPPImu: 
THE  THRESHOLD  IS  SET  FOR  EACH  FRAME  IN  SUBROUTINE  SETMAX.  IN  A  O 
FRAME  VALUES  ABOVE  THE  THRESHOLD  ARE  SET  TO  *1.0. VALUES  BELOW  - 
THRESHOLD  ARE  SET  TO  -1  0.  AND  VALUES  IN  BETWEEN  ARE  SET  TO  0.  0 

INPUTS  SPEECH— TH I S  IS  THE  INPUT  SPEECH  FILE  (ALSO  AN  OUTPUT) 
LSTSET-THIS  IS  THE  LAST  SET  TO  BE  PROCESSED 
TFRMAR-THIS  IS  THE  CLIPPING  THRESHOLD 
PNT SET -THIS  IS  THE  NUMBER  OF  POINTS  IN  i/3  OF  A  FRAME 
BECSET-THIS  IS  THE  FIRST  SET  TO  BE  PROCESSED 

OUTPUT  SPEECH- THIS  IS  THE  SPEECH  THAT  HAS  BEEN  CLIPPED 


DIMENSION  SPEECH(LSTSET.  PNTSET > . TFRMAR ( LSTSE T ) .  TEMP ( 500 ) . 
X  U<  500) 

INTEGER  BEGSET,  PNTSET 
LSTFHM  -  LSTSET  -  2 
00  70  NSET-BEGSET. LSTFRM 
DO  60  NM.  PNTSET 

IF  <SHEECH<NSET<  N)  LT.  TFRMAR < NSET > )  GO  TO  50 
SPEECH  <  NSET.  N)  »  1.0 
GO  TO  60 

SO  IF  <SPEECH<NSET< N)  LT  < -TFRMAR (NSET >) >  CO  TO  52 

SPEECH < NSET.  N)  -0.0 
GO  TO  60 

52  SPEECH  <  NSET.  N)  -  -1.0 

60  CONTINUE 

70  CONTINUE 

C 

C***  PLOT  CLPPER 
C 

X  M8  -  I 

X  NO  -  I 

X  NODE  »  O 

X  IFSLC  -  O 

X  DO  105  MS-BCOSET. LSTFRM 

X  DO  I 00  M6“l . PNTSET 

X  IF < M8  .  CT.  500)00  TO  i 10 

X  TEMP (MB)  »  SPEECH <M5« M6) 

X  MB  -  MU  ♦  1 

X100  CONTINUE 

XI 05  CONTINUE 

XI 10  CONTINUE 

X  MB  »  MB  -  1 

X  CALL  GRPH2  < "CLPPER “ .  NO.  TEMP,  U.  MB,  MODE.  YMIN,  YMAX.  IFSLC) 

RETURN 
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SUBROUTINE  AUTCOR< SPEECH, lSTSET, AUTTHfi. PNTSET, FRAMAR. DFG3F T> 


THIS  ROUTINE  COMPUTES  I  HI'  SrtURT  TIME  AUTOCORRELATION  FUNCTION  u'J  i  NG 
THE  FIRST  1/3  OF  THE  FRAME  WITH  LACS  FROM  N»16  TO  160<2  TO  20MSLI  > 
NEXT. THE  FIRST  FCAA  AFTER  R(O)  13  DETERMINED. IF  IT  EXCEEDS  THE 
THRESHOLD! I . E  VOICED)  THEN  THE  FRAME'S  PITCH  IS  DETERMINED  OTHERWISE 
THE  FRAME  IS  CONSIDERED  UNVOICED. 

INPUTS.  SPEECH-THIS  IS  THE  INPU1  SPEECH  FILE 

LSTSE1 -THIS  IS  THE  LAST  SET  TO  BE  PROCESSED 
AUTTHR-THIS  IS  THE  AUTOCORRELATION  THRESHOLD  WHICH  USED  TO 
FIND  THE  FUNDAMENTAL  PEAK 

PNTSET-THIS  IS  THE  NUMBER  OF  POINTS  IN  1/3  OF  A  FRAME 
BECSET-THIS  IS  THE  FIRST  SET  TO  BE  PROCESSED 

OUTPUT  FRAMAR -THIS  2-D  ARRAY  CONTAINS  PITCH  INFORMATION  WHERE 
THIS  SUBROUTINE  ENTERS  A  PITCH  VALUE  IF  VOICED 

»«••*«•**»•••«*•«••*••••••••••••«••••*«••*•«•••«•«»•*••««»•*••«•••*«• 

DIMENSION  SPEECHtLSTSET. PNTSET >. SPCHt 240 ). FRAMAR ( 10. 3). 

X  TMPAUT (O: 160) 

INTEGEH  DEGWIN.  ENDWlN.  POS. FRAMAR. BEUSET. TWCPST. PNTSET. 

X  CNSET 1 . CNSLT2 
REAL.  MAX 

LSTFRM  ■  LSTSET  -  4 

N6  «  0 

TWCPST  «  2  *  PNTSET 
DO  170  NSET^BEGSET. LSTFRM 

FMIN  »  lOOOO  {INITIALIZING  FOR  FIRST  VALLEY 
N36  «  O 
N6  «  N6  ♦  1 
RO  >  O  O 
C 

C***  CALCULATE  RIO)  FOR  THE  FRAME 
C 

OO  lOO  N- l, PNTSET 

RO  -  RO  ♦  SPEECH! NSET. n>**2 
100  CONTINUE 

IMPAUTIO)  »  HO 
C 

C*»*  INITILIZE  SPCH  TO  ZERO 
C 

DO  103  M16<-1.240 
SPCHIM16)  “00 
103  CONTINUE 
C 

C»«*  CALCULATE  RIK)  FOR  THE  FHAME 
C***PUT  3  SETS  INTO  NEW  ARKAY  SPCH 
C 

MN  »  O 

CNSET 1  *  NSET 
CNSET2  *  CNSET  1  «•  2 
DO  110  I  -CNSET  1.  CNSET2 
DO  103  J-l.PNTSfcT 
MN  »  NN  ♦  1 

SPCHIMN),  -  SPEECH!  I.  J) 

103  CONTINUE 

110  CONTINUE 

DO  133  MN-1.2 

DO  130  K-0. TWCPST 
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DO  130  N» 1 ,  PNTSUT 

RN  ■  HN  ♦  SPCH(K«-N>  *  SPCHiN) 

130  CONTINUE 

THPAUT <H )  -  HN 
X  N5  •  K 

iso  coNriNoe 

HALF  •  TMPAUT  ( 0 )  /3.  0 

DO  1000  NF*»0.  TWCP8T 

1F< THPAUT (NF)  LT  HALF) 00  TO  900 

THPAUT (NF)  -  THPAUT  <  NF  >  -  HALF 

00  TO  990 

900  IF  ( THPAUT  ( NF  >  .  LT.  < -HALF) >00  TO  990 

THPAUT  <  NT  )  “0.0 

990  THPAUT  <  NF  >  »  THPAUT  <  NF )  ♦  HALF 

1000  CONTINUE 

193  CONTINUE 

X  N9  ■  N9  ♦  1 

X  IF( (NSET.  EO.  LSTFRH).  AND.  (NO  EO.  A) )00  TO  199 

X  00  TO  190 

XI 99  CALL  PLOT  10( THPAUT,  N9.  N6.  XO.  YO.  1.  0) 

X  N6  •  10 

XI 98  CALL  PLOT 10 (THPAUT.  N9.  N6.  XO,  YO.  10) 

X  IF (NSET  EQ.  LSTFRH )N6  *  JO 

IFcFRAMAR (NSET , 1).  EO.  0)00  TO  169 
HAX  -00 

C»**  SEARCH  FOR  FIRST  PEAK  TO  EXCEED  THE  THRESHOLD  TTSTER 
C 

DO  163  NTT-1,  1 
I MULT  -  NTT  -  1 
TTH  -  .05  »  FLOAT  ( I HULT )  «■  .9 
TTSTER  -  HO  *  (1.0  -  TTII) 

DO  162  NIST*- SO-  160 
IF ( THPAUT ( NS D  .  LT. 

POS  -  NST 
MAX  -  THPAUT (NST) 

00  TO  164 

162  CONTINUE 

163  CONTINUE 

164  CONTINUE 
C***  NORMALIZE  THE  PEAK 

HAX  -  MAX /IIO 

IF  (MAX  .LT.  AUTTHR )  CO  TO  168 
C#**  VOICED  DECISION  -  1. UNVOICED  »  0,  POS 


TTSTER >C0  TO  162 


PITCH 


FRAHAR(NSET.  2) 

•  1 

FRAHAR(NSET.  3) 
00  TO  170 

-  POS 

168 

FR AMAH ( NSET ,  2) 

-  0 

169 

CONTINUE 

170 

CONTINUE 

RETURN 

END 
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PROGRAM  SCALE. FR 


THIS  PROGRAM  SCALES  SPEECH  FILES  SO  THAT  THERE  IS  A  MAX  VALUE 
OF  1900  AND  CAN  DE-EMPHASIZE  SPEECH 

INPUT:  MUST  SE  A  BLOCKED  FILE 


C  •**#••#***•****•******##**********#***#*■**#•**#****#**•#***■•*•*#*«*****#*«► 
DIMENSION  SI <236) 

DOUBLE  PRECISION  IX 

INTEGER  0UTFILE<7>.  INF ILE< 7 )»  F1LUFD! 18  > .  SPEECH! 236) 

NNOSIZ  -0 
IX  -  OBLE  <  203  > 

NNEWS  *  O 

ACCEPT “WARN I NO:  THE  INPUT  FILE  MUST  BE  AN  INTEGER  FILE  <CU> 

X  AND  BE  IN  BLOCKED  FORM.  <CR>  <CR> 

X  DO  YOU  WISH  TO  CONTINUE?! 1-Y. 0-N)  NYZ 

IFlNYZ  .  EQ  O >G0  TO  60 
ACCEPT-INPUT  FILENAME 
READ! 11. 39) INFILE! 1 ) 

39  FORMAT !Sl3) 

OPEN  1. INFILE. ATT-"C I". ERR-40 
FLIP  *1.0 

ACCEPT “OUTPUT  FILENAME  :  “ 

READ! 1 1. 3910UTFILE! 1 ) 

OPEN  2.  OUT FILE.  ERR* SO 
NDE  -  O 

ACCEPT-FLIP  FILE  !1-YES.0-N0>  “.FLIPY 
IF < FLIPY.  EQ.  1  )FLIP*-1.  O 
ACCEPT-OUTPUT  FILE  SIZE?  “. ISIZE 
ACCEPT-PERFORM  NOISE  ADDITION?! I -Y.  0-N>“.NNOIS 
IFiNNOIS.  EQ.  0)60  TO  S3 

ACCEPT-'SIZE  OF  MAX  NOISE?!REAL>  “.VNOSIZ 
33  CONTINUE 

MBLOCH  -  1 
N1S  -  O 
NV  -  0 
70  N6  -  O 

34  -  O 

DO  80  1*1.236 
SI!  I)  *  0.0 
SO  CONTINUE 

NS  -  0 

100  CONTINUE 

CALL  R0BLK1 1. NV. SPEECH.  MBLOCK.  I ENDS) 

DO  200  0-1.236 

IFINNOIS.  EQ.  0)C0  TO  120 

NNEWS  »■  I  NT !  !  SNGL 1  DR  AND !  I  X  >  )  — .  3 )  *VNOSI  Z  ) 

120  SPEECH! 0)  *  SPEECH! J)  ♦  NNEWS 

180  N2  -  IABS! SPEECH ! J) > 

IF1N2  .  CT.  NS)N5  *  N2 
N6  —  N6  ♦  1 
200  CONTINUE 

NV  -  NV  ♦  1 

IF!NV  .  LT.  ISIZE  > GO  TO  10O 

300  TYPE “THE  FOLLOWINO  NO  OF  POINTS  WHERE  CHECKED  ”.  N6 

TYPE “AND  THE  MAX.  VALUE  FOUND  WAS  N5 
CALL  REWIND! 1) 


600 

690 

680 

700 


900 


90 


IX  *  DOLE ( 203) 

S3  »  0  O 
04  »  O  O 
06  -00 

S3  *  1900.0  /  FLOAT <N9)  *  t-LIP 

N6  »  O 
NV  *  O 

CALL  RDBUM  1.  NV.  SPEECH.  MBLOCK.  I  ENDS) 

DO  700  J-1.396 

IF (NNOIS.  EO.  0)C0  TO  690 

NNEWS  *  INT <SNOL<DRAND< IX ) )*VNOSIZ ) 

SPEECH! J>  ■  SPE£CH<J>  «■  NNEWS 
SI < J)  -  FLOAT < SPEECH <J)>  *  S2 
SPEECH< J)  -  INT <SI < J > ) 

CONTINUE 

CALL  WRBLK < 2.  NV.  SPEECH.  MBLOCK.  I ER> 

N6  -  N6  ♦  1 
NV  «  NV  «■  1 

IF<NV  .  LT.  ISIZEICO  TO  600 

CONTINUE 

N15  -  N6  »  296 

TYPE "THE  FOLLOWING  NO.  OF  POINTS  WERE  OUTPUT  “.N6 
CALL  CLOSE! 1. IER > 

IF< IER  NE.  1)TYPE"CL0SE  ERROR  ON  INPUT  ".IER 
CALL  CLOSE <2.  IER) 

IF< IER  NE.  i ) TYPE "CLOSE  ERROR  ON  OUTPUT  “.IER 
TYPE "PROCESSED:  “.  N6 
CO  TO  60 

TYPE-OPEN  ERROR  ON  OUTPUT  “ 

CO  TO  60 

TYPE-OPEN  ERROR  ON  INPUT  “ 

STOP 

END 


40 

60 
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C  PROGRAM  PLTTER 

c 

c  this  program  works  interactively  with  the  user  to  produce  a 

C  PITCH  PLOT  OR  a  PLOT  OF  SPEECH.  THE  SPEECH  PLOT  REQUIRES  A 
C  MAXIMUM  VALUE  OF  2000  O  TO  HAVE  CONSTANT  VERTICAL  AXIS. 

C 

c  *#•*■*♦***■****  ******* *********************************  ************* 

DIMENSION  0UTFILE(7>.  T(912).  V(912) 

INTEGER  OUT FILE 
ACCEPT "FILE  NAME?  - 
REAOdl.  39)0UTFILE(  1 ) 

39  F0RMAT<S13> 

CALL  OPEN! 1. OUTFILE. 1 . IER > 

IF I IER.  NE.  1 )TYPE“OPEN  ERROR  IER 
ACCEPT-FILE  OUTPUT  FROM  PITCH?  1-Y. 0-N  M100 

IFIM100  .  EG.  1 )G0  TO  1000 
DO  90  1-1, 912 
T  <  I »  *  0  0 
V(l)  -0.0 
90  CONTINUE 

NP  »  O 

DO  200  M-l. 900 
DO  lOO  1=1. 912 
N  *  1 

IF (M100.  NE.  1 >G0  TO  79 
READ! 1.  69.  END-300) Jl.  J2.  J3 
69  F0RMAT(3X.  3(110.  3X)> 

V(I>  «  FLOAT ( J3 ) 

T(I)  -  FLOAT ( J2 ) 

GO  TO  lOO 

79  READ  BINARY ( 1.  END— 300) J3 

V(I)  -  FLOAT ( J3 ) 
lOO  CONTINUE 

NP  -  NP  ♦  1 
NPTS  -  N 
SF  -  1.  O 

TYPE "RUNNING  PLOT-. NP. “NPTS". NPTS 
CALL  PL0T10<V. NPTS.  NP. XO.  YO.  SF ) 

IF(NP  .  EQ.  10)NP  -  0 
200  CONTINUE 

300  CONTINUE 

IF(N  .  EQ.  9 1 2 > GO  TO  400 
DO  400  M3— N.  912 
V(M3)  -0.0 
400  CONTINUE 

IF (NP  .NE.  0)00  TO  490 
NP  -  1 

CALL  PL0T10( V. NPTS. NP.  XO.  YO.  SF  > 

429  NP  -  10 

DO  430  M3— 1. 912 
V(H3>  -  0.  0 

430  CONTINUE 

CALL  PL0T10(V. NPTS.  NP.  XO.  YO.  SF) 

GO  TO  900 
490  CONTINUE 

IF(NP.  NE.  6) GO  TO  479 

CALL  PLOT 10 (V. NPTS.  NP.  XO, YO.  SF) 

GO  TO  429 
479  CONTINUE 
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NP  -  10 

CALL  PL0T10<V.  NPTS. NP.  X0, YO, SF> 
900  CALL  CLUSfcU. IER) 

CO  TO  9000 

1000  00  1100  IM.912 

N-I 

XF<mOO  ME.  DCO  TO  1079 
READ! 1.  69.  END»1300> Jl.  J2.  J3 
FORMAT (3X.  3< 1 10.  3X ) 1 
V<I)  -  FLOAT <J3> 

.  T<I>  -  FLOAT ( 02 ) 

00  TO  UOO 

1079  READ  BINARY! 1. END-1300101.  J2.  03 

V<1)  -  FLOAT (J3) 

1100  CONTINUE 

1300  CONTINUE 

NPTB  »  N 
NP  «  1 

SF  •  10 

TYPE" RUNNING  PLOT". NP 

CALL  PL0T10<V.  NPTS.  NP.  XO.  YO.  SF) 

NP  *  10 

TYPE "RUNNING  PLOT" 

CALL  PL0T10(T. NPTS.  NP.  XO.  YO.  SF) 
1900  CALL  CLOSE <1.  IER) 

9000  STOP 
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PROGRAM  SEIUP 

AUTHOR:  WILL  > JANS'.  SI - N 

DATE:  17  APRIL  U3 

LANGUAGE:  FORTRAN!} 

FUNCTION:  THIS  PROGRAM  ALLOWS  THE  USER  TO  SETUP  A  FILE  THAT 

CONTAINS  INFORMATION  REQUIRED  TO  RUN  THE 
RECURSIVE  LINEAR  PREDICTIVE  CODER <RLPC>. 

THE  PROGRAM  WILL  ALLOW  THE  USER  THE  FOLLOWING 
OPTIONS. 

1)  CREATE  A  NEW  FILE 

2)  UPDATE  AN  OLD  PILE 
3>  PRINT  PARAMETERS 


LOAD  COMMAND  LINE:  RLDR  SETUP  RFLIDK 

NOTE:  1)  THE  ARRAYS  ARE  SET  TO  MAX  OF  23  VARIABLES 

EACH. 

2)  THE  REAL  ARRAY  IS  CALLED  RELVAR  AND  THE 
INTEGER  ARRAY  IS  CALLED  I NT VAR 


•**•#**■**#•***•****•*****•*******■**■#***************•*•*  ■**■******•*■**■•  •**•*■** 
SETUP 


C  *•********•***•**•*****•**************»*****»*»*•***************•«■•»■*  **♦*•**' 
DIMENSION  RELVAR ( 23 )• 1NTVAR < 25 >. OUTFILE! 7 > 

INTEOER  YES. YES2.  SIZER.  SIZEI . YES 3 
C***  SIZER  -REAL  ARRAY  SIZE. SIZEI  -  INTEGER  ARRAY  SIZE 
SIZER  -  23 
SIZEI  «  2 3 

C*«*  NEW  OR  OLD  FILE  *** 

WRITE! 10.29) 

39  FORMAT ( 3X.  'THIS  PROGRAM  CREATES  AN  INPUT  FILE  FOR  THE  RLPC ' ) 
ACCEPT-ARE  YOU  CREATING  A  FILE  FROM  SCRATCH?! 1 -YES. O-NO) ", YES 
TYPE" 

IF  ! YES  .  EG.  0)G0  TO  30 
C 

C***  NEW  FILE 
C 

ACCEPT-DOES  THE  NEW  FILE  CURRENTLY  EX  1ST?! 1 -YES.  O-NO) “.  YES2 
TYPE" 

IF! YES2  EG  1)G0  TO  30 

TYPE"  THE  FILE  MUST  BE  CREATED  BEFORE  USING  THIS  PROGRAM" 

GO  TO  lOOO 
C 

C*«*  GET  FILE  NAME  *** 

C 

30  ACCEPT"FILE  NAME?  " 

READ! 11. 3910UTFILE! 1 ) 

39  FORMAT! SI 3) 

CALL  OPEN! 1.  OUTFILE.  3.  IER ) 

IF! IER  .  NE.  1 ) TYPE “OPEN  ERROR  ".IER 
TYPE"  “  , 

C 

C*-**  INITIALIZE  THE  ARRAYS.  NEW  FILES-SET  -  TO  0.  OLD  FILES-READ  IN 
C  OLD  FILES*** 
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c 

IF  < YES  EU  0>C0  TO  00 
00  45  1-1 . 01 /EH 
45  RELVAR < I >  -0.0 

00  47  1=1.  S1ZEI 
47  1NTVAR  < I )  “  O 

GOTO  60 

SO  READ  ( 1«  901 ) (RELVAR < I ) . 1*1. SIZER ) 

READ  (1.902>(1NTVAR(I).  1=1.  SIZED 
C 

C***  UPDATE  ARRAYS  *** 

C 

TYPE"  <CR> 

X  IF  YOU  CHOOSE  TO.  CHANCE  A  VARIABLE  ENTER  Y  <CR> 

X  OTHERWISE  JUST  DO  A  CARRIAGE  RETURN.  <CR> 

X  <CR>" 

60  CONTINUE 

TYPE-THE  CURRENT  VALUE  OF  CLPLEVUN  PITCH2)  IS  :  ".RELVAR(l) 

TYPE" CHANCE  VALUE?  <CR> 

CALL  RCHAR < I CHAR • IEH) 

if < i char  .  ne.  89>co  to  soon 

ACCEPT  "«CCR>  INPUT  NEW  REAL.  VALUE  :  RELVAR  (  1  > 

C 

5000  TYPE-CURRENT  AUTOCORRELATION  THRESHOLD  (IN  PITCH2)  IS  :  " . RELVAR (2) 

TYPE-CHANCE  VALUE?  <CR> 

CALL  RCHAR( ICHAR.  IEH) 

IF ( ICHAR.  NE.  89)00  TO  5001 

ACCEPT“<CR>  INPUT  NEW  VALUE.  m.RELVAR(2) 

C 

5001  TYPE-THE  CURRENT  VALUE  OF  VOICED/UN  THRESH  (PITCH2)  IS:  ".RELVARO) 
TYPE-CHANCE  VALUE?  *'CR> 

CALL  RCHAR< ICHAR.  ILK ) 

IF( ICHAR.  NE.  89JG0  TO  5002 

ACCEPT"  <CHJ  INPUT  NEW  REAL  VALUE:  “.RELVARO) 

C 

5002  TYPE-THE  CURRENT  VALUE  OF  WINDOW  SHAPE  (CODER)  :  ".RELVAR (4) 

TYPE “CHANCE  VALUEV  <CR> 

CALL  RCHAR( ICHAR.  IEH ) 

IF< ICHAR.  NE.  89)00  TO  5003 

ACCEPT"  <CR>  INPUT  NEW  VALUE:  ".RELVAR (4) 

C 

5003  TYPE-CURRENT  VALUE  OF  SPEECH  SCALE- (IN  CODER):  ".RELVAR (5) 
TYPE-CHANCE  VALUE?  <CH> 

CALL  RCHAR ( ICHAR, IEH) 

IF ( ICHAR  NE.  89) GO  TO  5015 

ACCEPT-  <CR>  INPUT  NEW  VALUE:  ". RELVAR (5) 

C 

5015  TYPE-CURRENT  VALUE  OF  SILENCE  THRESH- < SIFT) IS:  “ ,  RELVAR ( 6 ) 

TYPE "CHANCE  VALUE?  <CH> 

CALL  RCHAR (ICHAR. IEH) 

IF( ICHAR.  NE.  89)00  TO  5216 

ACCEPT"  <CR>  INPUT  NEW  VALUE:  ".RELVAR (6) 

C 

5216  TYPE"  VALUE  OF  UNVOCD  TO  VOICED  OAIN  (SYNTH)  IS:  ",  RELVAR (7) 

TYPE " CHANGE  VALUE?  <CR> 

CALL  RCHAR (ICHAR.  IER ) 

IF< ICHAR.  NE  89)00  TO  5217 

ACCEPT-  <CR>  IOJPUT  NEW  VALUE:  " ,  RELVAR < 7 ) 

C 

5217  TYPE"  VALUE  OF  GLOTTAL  PULSE  POS.  (SYNTH)  IS:  -.  RELVARO) 
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TYPE "CHANGE  VALUE  '  •' CRV 

CALL  RCHAR  < ICHAR,  IEH ) 

IF ( I CHAR  NE  89) GO  TO  0218 

ACCEPT**  <CR>  INPUT  NEW  VALUE  **.  RELVAR  <  8  > 

C 

5218  TYPE-  VALUE  OF  GLOTTAL  PULSE  NEG  (SYNTH)  IS:  "» RELVAR ( 9 ) 

TYPE-CHANGE  VALUEV  <CR> 

CALL  RCHAR! ICHAR.  IEH ) 

IF< ICHAR.  NE.  89)00  TO  5219 

ACCEPT-  <CR>  INPUT  NEW  VAL.UE.  "» RELVAR (9) 

C 

5219  TYPE-CURRENT  VALUE  SILENT  THR.  (PITCH2)  IS:  ".RELVAR! 10) 

TYPE-CHANGE  VALUEV  <CR> 

CALL  RCHAR< ICHAR. IEH) 

IF( ICHAR.  NE.  89)00  TO  5013 

ACCEPT"  <CR>  INPUT  NEW  VALUE:  ".RELVARtIO) 

C 

5013  TYPE-CURRENT  VALUE: NO.  OF  POINTS/SET  (PITCH2)  ".INTVAR(l) 
TYPE-CHANGE  VALUEV  <CR.> 

CALL  RCHAR( ICHAR.  IEH) 

IF< ICHAR.  NE.  89)G0  TO  5004 

ACCEPT"  <CR.‘:  INPUT  NEW  VALUE:  **,INTVAR(1) 

C 

5004  TYPE"  VALUE  OF  FILTER  SPACINGS  IS  (CODER  &  SYNTH)  :  IN1VAH(2) 

TYPE-CHANCE  VALUE?  <CR> 

CALL  RCHAR( ICHAR. IEH) 

IF< ICHAR.  NE.  89)00  TO  5005 

ACCEPT-  <CR>  INPUT  NEW  VALUE:  “. INTVAR(2) 

C 

5005  TYPE-USE  PRE/DE-EMPHASIS! 1-Y.  0-N)  ( CQDER&SYNTH )  :  ".  INI  VAR ( 3 > 
TYPE-CHANGE  VALUE?  <CR> 

CALL  RCHAR< ICHAR.  IEH) 

IF( ICHAR.  NE.  89)00  TO  5006 

ACCEPT-  <CR>  INPUT  NEW  VALUE:  “. INTVAR13) 

C 

5006  TYPE-USE  ( O- I MPULSE. 1 -GLOTTAL  PULSE)  (SYNTH)  :".INTVAR(4> 
TYPE-CHANGE  VALUE?  <CR> 

CALL  RCHAR! ICHAR. IEH) 

IF( ICHAR.  NE.  89)00  TO  5007 

ACCEPT"  <CR>  INPUT  NEW  VALUE:  N.INTVAR(4> 

C 

5007  CONTINUE 

C 

C***  TYPE  ARRAY  *** 


.  «* 


I 

% 


I 

V 

V 


ACCEPT-DO  YOU  WANT  TO  HAVE  THE  ARRAY  TYPED! 1-YES. O-NO) : 
TYPE" 

IF (YES  .  EG.  0)00  TO  200 

TYPE"  CLPLEV:  “. RELVAR! 1) 

TYPE-  AUTOCORRELATION  THRESHOLD:  " . RELVAR ( 2 ) 
TYPE-VOICED/UN  THRESHOLD:  " . RELVAR ! 3 ) 

TYPE-WINDOW  SHAPE  VALUE:  ". RELVAR (4) 

TYPE-SPEECH  SCALE-! IN  CODER):  ".RELVAR (5) 

TYPE-SILENCE  THRESHOLD:  " . RELVAR ! 6  > 

TYPE-UNVOCD  TO  VOICED  OAIN:  ". RELVAR (7) 

TYPE-OLOTTAL  PULSE  POS.  :  ", RELVAR (8) 

TYPE-OLOTTAL  PULSE  NEO.  :  -.RELVAR (9) 

TYPE"SILENCE  THR.  IN  PITCH:  ".RELVAR! 10) 

TYPE-NO.  POINTS/SET  -.INTVAR(l) 

TYPE-FILTER  SPACING  IS:  ". INTVAR(2> 


YES 
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TYP£"PRE/DF-EMPHAS1S( 1-Y,  0-N> :  ",  INTVARO) 

TYP£“GLO I T AL  PULSE ( 1-Y. 0-N)  M  .  INTVAFf  (  4  ) 

C 

C***  OUTPUT  FILE  *«* 

C 

200  ACCEPT- IS  THE  INPUT  FILE  TWj  OUTPUT  FILE<  1-YES.  0-N0) :  ". YES2 
TYPE" 

IF  (YES2  .  EQ.  1 )C0  TO  75 
CALL  CLOSE! 1.  IER) 

IF  HER  .  NE.  1 ) TYPE "CLOSE  FILE  ERROR!  ". IER 
ACCEPT"FILE  NAME?  " 

READ! 11. 69)0UTFIL£( 1 ) 

69  FORMAT (S 13) 

CALL  OPEN! 1. 0UTF1LF.  3.  IER) 

IF  <IER  .NE.  1 > TYPE "OPEN  ERROR  ON  OUTPUT  FILE  ".IER 
75  CALL  REWIND! 1 > 

WRITE  !  1 . 90 1 ) ! RELVAR !  1 ) •  I*»l.  SIZER) 

WRITE  !  1 .  902 ) !  I NTVAR !  I ) .  I  M.  SIZED 
CALL  CLOSE! 1.  IER) 

IF  HER  .  NE.  1 )  TYP£"CLQSE  FILE  ERR0R2  ".IER 
TYPE" 

ACCEPT"PRINT  ARRAY  ON  PRIN1R0NIX  ?! 1-Y, 0-N)  “.YES 
TYPE" 

IF! YES  EG.  0)C0  TO  10O1 
WRITE! 12. 1499)0UTFILE! 1 ) 

CALL  FGOAYHMON.  IDAY.  I  YEAR ) 

CALL  FOTIME! IHOUR. ININ, ISEC > 

WRITE!  12.  1311  MMON.  IDAY.  1  YEAR 
WRITE!  12.  1312) IHOUR.  ININ.  ISCC 

1311  FORMAT! "O". "DATE  “. IX, 12. "/". 12. "/". 12) 

1312  FORMAT! "0". "TIME  ". IX. 12. " : *. 12. " : 12) 

WRITE! 12.  1500>RELVAR!1> 

WRITE! 12. 1301 )RELVAH!2) 

WRITE! 12. 1 302 ) RELVAR ! 3 ) 

WRITE! 12. 1 303 ) RELVAR ! 4 ) 

WRITE! 12.  1304) RELVAR! 3) 

WRITE! 12. 1303)RELVAR(6) 

WRITE! 12. 1306) RELVAR 17) 

WRITE! 12. 1307) RELVAR !8) 

WRITE! 12.  1 308 ) RELVAR ! 9 ) 

WRITE! 12. 1309) RELVAR! 10) 

WRITE! 12, 1600) INT VAR!  1) 

WRITE! 12. 1601 ) INTVAR!2) 

WRITE! 12. 1602) 1NTVAR  <3) 

WRITE! 12,  1603 ) INTVAR ! 4) 

1499  FORMAT! IX. S13) 


1300 

FORMAT! “0". 

N 

CLIPPING  LEVEL 

“.  F12.  3) 

1301 

FORMAT! "0". 

N 

AUTOCORRELATION  THRESHOLD 

" .  F 1 2.  5) 

1302 

FORMAT! "0". 

M 

VO ICED /UNVOICED  THRESHOLD 

“»  Fl2.  5) 

1303 

FORMAT! "0", 

M 

WINDOW  SHAPE  A>* 

",  F12.  3) 

1304 

FORMAT! “0". 

M 

SPEECH  SCALER 

"i  F 12.  3) 

1303 

FORMAT! "0". 

It 

SILENCE  THRESHOLD  (SIFT  PITCH) 

",  F12.  5) 

1306 

FORMAT! "0“. 

M 

UNVOCD  TO  VOICED  OAIN 

".FI 2.  3) 

1307 

FORMAT! "0". 

« 

GLOTTAL  PULSE  P08.  SLOPE 

" .  F 12.  3) 

1308 

FORMAT! "0". 

M 

OLOTTAL  PULSE  NEG.  SLOPE 

“.  F12.  3) 

1309 

FORMAT! "0". 

M 

SILENCE  THR.  IN  PITCH 

“»  F12.  3) 

1600 

FORMAT! "0". 

# 

POINTS/SET 

16) 

1601 

FORMAT! "0". 

N 

FRAME  SEPERATIQN 

",  16) 

1602 

FORMAT ! "0", 

M 

PRE/DE-EMPHAS I S ! 1 -Y,  0-N  > 

".  16) 

1603 

FORMAT! "0". 

M 

GLOTTAL  PULSE! 1 -YES.  O-IMPULSE) 

",  16) 

& 
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AUTHOR  KATHY  DIXON  WARD  WITH  MODIFICATION  DY  WILL  JANSSEN 

DATE  0  APIHL  w‘Li 

LANQUACE.  FORTRAN  M 

FUNCTION:  THIS  PROCRAM  CALCULATES  THE  LINEAR  MAGNITUDE, 

LOG  MAGNITUDE,  PHASE.  AND  IMPULSE  RESPONSE  OF 
A  DIGITAL  FILTER  FOR  N  POINTS.  OPTIONS  FOR 
PLOTTING  THE  RESULTS  ON  THE  TEKTRONIX  4010  AND 
STORING  THE  RESULTS  IN  FILES-  OUT 1 < MAGNITUDE. 
PHASE)  AND  OUTST IMPUTSE  RESPONSE)  ARE  AVAILABLE. 
INPUT  DATA  TO  THE  PROGRAM  IS  A  FILE  CONTAINING 
M.  N.  A<I).B(I>.  AND  AO  FROM  THE  FOLLOWING  FORM 
OF  THE  FILTER  TRANSFER  FUNCTION: 

H<Z)«A0»<B(0)-»B<  1  >*Z*#<-i  )♦  .  +B(M)*Z**< -M> ) / 

<A<0)+A<  I  )*Z**(-I )+.  ■*-A(N)*Z**(— N) ) 


LOAD  COMMAND  LINE:  RLDR  KEVAL  PLOT  10  PLOTS.  LB  GRPH.  LB  PFLlBfc 


1)  THE  LINK  DP'lF  GRPH.  L  U  MUST  EXIST. 

2)  THE  MAXIMUM  NUMBER  OF  POINTS  (NP>  MUST  BE 
LESS  THAN  OR  EQUAL  TO  300. 


COMPLEX  Y.  SUM!.  SUM2.  SUMO 

DIMENSION  RNAOLNtSOO).  OEGISOO). U(SOO). T( 300) . D(256> . A<236 ) 
DIMENSION  OUTFILE ( 7 ) >  RMACLO<  300  > . H( SOI >  >  OUT 1(7). 0UT2( 7 ) 
DIMENSION  FILEOUT <7) 

INTEGER  DB 
PI-3  1413V 
M  -  0 

NSOO  -  1  i  ELIMINATES  THE  STORE  OPTION 

NFIRST  «  O 

NPP  »  i 

1TEND  -  O 

J COUNT  -  O 

N DO INC  »  O 

ISTARO  *«  O 

I YEA  *  O 

DATA  PPL'f .  1MH.  LIN.  DU/O.  O.  0,  <>/ 


►INPUT  FILE* 


TYPE-BASICALLY  THIS  PROORAM  CAN  PROVIDE  ONE  OF  3  DIFFERENT <CR> 
OUTPUTS  THE  INPUT  FILE  CONTAINS  A  SET  OF  LPC  PREDICTOR  <CR> 
COOFICIENTS  AND  THE  OUTPUT  CAN  BE  IN  THE  FOLLOWING  FORMS.  <CR> 

1 )  TEKTRONIX  PLOTS  OF  IMPULSE  RESPONSE-  PHASE  RESPONSE.  <CH> 

LINEAR  MAG.  RESPONSE.  AND  LOO  MAG.  RESPONSE  FOR  EACH  <CR> 
FILTER  SET*  <0N> 

2)  AN  OUTPUT  FILE  CAN  BE  CREATED  AS  AN  INPUT  TO  A  3-D  <CR> 
PLOT  PROGRAM  <CR>  <7>  " 

ACCEPT-CREATE  hj-D  FILE ( 1-Y. 0-N)  ", NWNOT 
IFINWNOT  £Q.  0)00  TO  36 

ACCEPT-HOW  MANY  COSF.  SETS  ",  NLONG 
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«CCEP  T“FILE  NAME  (OUTPUT)''  " 

READ! 11. 40)FILE0UT< 1 ) 

CALL  DF 1LW1FILE0UT.  1EH> 

IF< IER  Ell.  1 3 ) UO  (0  10 
IF1IER  .  NE.  1 )  TYPE  “DELETE  ERROR  M.  ICR 
10  CONTINUE 

CALL  CFILW1FILE0UT.  2.  IER) 

IF!  IER  .  NE.  1>TYPE“CREATE  1.00  HAO  FILE  ERROR  *«  IER 
CALL  0PEN14.FILE0UT.  3.  1EN> 

IF< IER  .  NE.  1 >  TYPE" OPEN  ERROR  ON  LOO  MAO  OUTPUT  -.IER 
ACCEPT* (1 -FOR  YOUR  CHOICE  OF  START. O-IND.  PLOTS)*. IYEA 
IF ( I YEA  .  EQ.  0)00  TO  12 
ACCEPT “HOW  MANY  IN  *.  NCIN 
12  CONTINUE 

NYS  -  O 
CO  TO  38 
C 

C***CH£CH  FOR  PRINTRONICS  PLOT 


36  NYS-0 

38  ACCEPT-FILTER  COEFFICIENTS  FILE  NAME?  “ 

READ! 11. 40  JOUTFILE ( 1 ) 

40  FORMAT! SI 3) 

CALL  OPEN ( 3.  OUTF 1 LE .  1.  IER) 

IF < IER. NE.  1)TYPE“0PEN  ERROR  IER 
READ  DINARY(3)NREAD 

46  READ  8INARY13) (A< I ).  1-2.  NRLAD*1 > 

N  «  NREAD 

READ  BINARY13.  END=*9593)A0 
All)  »  1.0 
BID  *  1.0 
CO  TO  9596 
9595  ITEND  -  1 

9396  CONTINUE 

IF!  1  IYEA.  EO.  1 )  AND.  <  (JCQUHT+U.  Lt.  NCIN)  ICO  TO  90 
C 

C ******** **SET  UP  FOR  IMPULSE  RESPONSE******************************* 

C 

H11)-B11)/A<1> 

IF1N.  EQ.  M)  CO  TO  62 
IF1N.  LT.  M)  CO  TO  60 
DO  37  I-M+l.N 

37  B(I*l>-0. 

D-N 

00  TO  63 

60  DO  61  I-N+l.M 

61  A1I*1)*0. 

62  D— H 
C 

C*******CALCULATE  MAGNITUDE.  PHASE.  IMPULSE  RESPONSE***************** 
C 

63  IF1NFIRBT  .  EQ.  1)00  TO  70 

ACCEPT “ENTER  NUMBER  OF  POINTS  (MAX-100)  “.NP 
IF1NYS  .EQ.  DNFIR8T  -  1 
IF<  IYEA  .EQ.  1  1NFIRST  -  1 
70  DO  90  1-1.  NP 

Y— CMPLX <0.  <PI/NP)*<I-1)> 

SUM 1-0. 

SUM2-0. 

DO  72  U-l.M+l 


'*  A 
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sumi-sumd(B(j>*<cexp<y*(i  -jd  >  > 

DO  74  H-i.N+1 
74  SUM2-SUM2*(A(K>*<CEXP(Y*(1-K>  > > ) 

IF<SUI12.  EG.  O.  )TYPE“ ***WARNJNG — PROGRAM  ATTEMPTING  FU 
X  EVALUATE  A  POLE***" 

IF(SUM2.  EO.  O)  SUMS*.  lE-10 
SUM3- AO * < SUM1 ✓ SUMS ) 

X-RCAL(SUMH) 

RMAGLN( 1 )  -  ( CABS  <  SUM3 ) ) **2 
W-RMACLN< I ) 

IF(W  EQ.  O)  W-.  IE-10 
RMACLQ<U»10.  *AL0C10(RMAGLN( 1 ) ) 

IF(X.  EO.  0)  X«.  IE-10 

DEC ( I ) -ATAN2 ( AI MAO < SUMS >. X > 

SUM4-0 

00  88  L*l.  I 

IF ( D. GE. I)GO  TO  88 

A(IM>»0. 

B( 1*1 >»0. 

88  SUM4-A  < 1 -L+2 ) *H  <  L ) +SUM4 

H( I  ♦  D  —  ( — SUM4  +B(I+1))/A« * ) 

90  CONTINUE 

JCQUNT  «  dCOUNT  1 

93  IF ( ( IYEA  EQ.  1)  AND.  (JCOUNT  .  GE.  NCIN))GO  TO  95 

CO  TO  99 

99  IF (JCOUNT  CT  (NCIN«-NLON«>  )G0  TO  175 

MUTE  BINARY(4>  (RHAOLGIKI  » »  HI-1 .  NP ) 

TYPE-MROTE  OUT  ROW". ( JCOUNT-NCIN) 

C 

C*********PLOT  OPTIONS********************************************** 

C 

99  IFSCL-0 
NC-1 
N-NP 

IF(NYS  .EQ.  DCO  TO  SOOO 
IF ( IYEA  EQ.  DCO  TO  46 
IFINFIRST  .EQ.  DGO  TO  100 

ACCEPT "LI NEAR  MAGNITUDE  PLOT?  ( 2-CONNECTED  POINTS. 1-VERTICAL 
X  LINES. O-NO)  ".LIN 

100  CONTINUE 

IF1LIN.  EQ.  O)  00  TO  106 
IF (LIN.  EQ.  D  MODE* I 
IF(LIN  EQ.  2)  MODE-O 
DO  104  1*1. N 
104  T(1)-RMACLN(D 

CALL  CRPH2( "LINEAR  MAGNITUDE".  NG.  T. U.  N,  MODE.  YMIN,  YMAX.  IF3CL) 
CALL  CCHAR ( I CHAR.  IER > 

IF ( IER.  NE.  1 )  TYPE “CONTINUE  CHARACTER  ERROR  ". IER 

106  1F(NFIRST  .  EQ.  DCO  TO  107 

ACCEPT-LOC  MAGNITUDE  PLOT?  (2-CONNECTED  POINTS.  1-VERTICAL 
X  LINES.  O-NO)  ".OB 

107  CONTINUE 

IF(OB.  EQ.  0)  CO  TO  111 

IF1DB.  EQ.  1)  MODE*  1 
IF(DB.  EQ. 2)  MODE-O 
DO  109  I-l.N 
109  T  < I ) -RMACLC  < I ) 

CALL  CRPH2("L(JG  MAGNITUDE".  NC.  T.  U.  N.  MODE.  YMIN,  YMAX.  IFSCL) 
CALL  CCHAR ( I CHAR.  IER) 

IF(IER.  NE.  DTYPE"CONTINUE  CHARACTER  ERROR  ",  IER 


1  to 


l  F  <  <  NWNQT  CO  t)  AND.  <  I  YEA  EG  0>>G0  TO  UO 
GO  TO  til 

ACCEPT" STAS T  3-D  FILE  HERE < I -Y. 0-N ) “ . NDGY 
NFIRST  »  1 

1 F ( NDCY  .  EG  0)00  TO  46 
I YEA  «  1 

!F< ISTA80  .EG.  1)00  TO  1101 
NC IN  -  JCOUNT 
1STAR0  -  1 
1101  CO  TO  93 

111  IFtNFIRST  .EG.  1)00  TO  111.' 

ACCEPT" PHASE  PLOT?  (2-CONNCCTED  POINTS.  1-VERTICAL 
X  LINES. O-NO)  “.PPLT 

112  CONTINUE 

IFtPPLT.  EG.  0)  CO  TO  116 
IF  (PPLT.  EG.  1 )  MODEM 
IF < PPLT.  EG.  2)  MODE-O 
DO  114  J-l.N 
114  TtJXOECtJ) 

CALL  CRPH2< "PHASE". NO. T. U. N.  MODE. YMIN. YMAX.  IPSCL) 

CALL  CCHAR < ICHAR. IER ) 

IFdER.  NE.  DTYPC'CONriNUE  CHARACTER  ERROR  "»  IER 

116  IFtNFIRST  .  EG.  1)00  TO  117 

ACCEPT" IMPULSE  RESPONSE  PLOT?  ( 2-CONNECTED  POINTS. 1 -VERTICAL  LINES. 
X  O-NO)  ", IMR 

117  CONTINUE 

IFtIMR  EG.  O)  CO  TO  170 
IFtIMR  EG.  1 ) MODE* 1 
IFtIMR.  EG  2) MODE >0 
DO  119  I»1,N 
119  TtD-Htl) 

CALL  0RPH21 “ IMPULSE  RESPONSE",  NO.  T. U.  N.  MODE.  YMIN,  YMAX.  IFSCL) 

CALL  OCHARt ICHAR. IER) 

IFtlER  NE.  1) TYPE "CONTINUE  CHARACTER  ERROR  ".IER 
CO  TO  170 
C 

C***PLOT  OPTION  FOR  PRINTRONIX 
C 

2000  CONTINUE 

SF  -  1.0 
NPTS  *  NP 
DO  2010  J-l.NP 
TtJ)  ■  RMAOLOtJ) 

2010  CONTINUE 

TYPE “RUNN 1  NO  PLOT  “.  NPP,  “Nt'TS".  NPTS 
CALL  PLOTlOtT.  NPTS.  NPP,  XO.  YO.  SF) 

IFt  t I TEND  .£0.  1)  AND.  tNPP  NE.  3>)C0  TO  2013 

I F 1 1 TEND  .EG.  1)00  TO  20 12 

IFtNPP  .  EG.  10)NPP-0 

NPP  ■  NPP  ♦  1 

AON  -  1 

CO  TO  174 

2012  NPP  ■  NPP  ♦  1 

CALL  PLOTlOtT. NPTS.  NPP,  XO,  YO.  SF) 

2013  NPP  •  10 

CALL  PLOTlOtT. NPTS.  NPP.  XO.  YO.  SF) 

CO  TO  173 
C 

C ♦♦•♦♦•♦♦REPEAT  OPTION*********************************************** 

C 


IF  *  i  TEND  EG  1  )  GO  TO  l  r.> 

ACCEPT-RUN  AGAIN-1  <  1  -YES.  O-MO )  AGN 
NF1RST  -  1 

IF  ( AGN.  £0.  1  )  GO  TO  46 
CALL  CLOSE (3.  IER ) 

IFIIER  .  NE.  1 ) TYPE "CLOSE  FILE  ERROR  “.IER 
IFdYEA  EQ.  0)00  TO  211 
CALL  CLOSE (4.  IER) 

1F< IER  . NE.  I ) TYPE "CLOSE  ERROR  ON  OUTPUT  FILE 
CONTINUE 


uuuuuuuuuuuuuuuuuuuuuuuuuuuuu 


THIS  ROUTINE  CREATES  AN  GUIPUT  SEUUENCE  OF  A  DIGITAL 
FILTER  WITH  A  SIN  WAVE  INFU).  THE  INPUT  COEFILlENrS  CAN 
COME  FROM  A  FILE  OR  DE  INPUT  FROM  A  TERMINAL. 


BE  SURE  YOU  ARE  ON  THE  TEKTRONIX  4010-1 

LOAD  LINE:  RLDR  FORMANT  EXPAND  PQLYMLT  PL0T10  PLOTS.  LO  CiRPH  LB  <CFl  IU< 
COMMENTS: 

AEVAL-REQU1 RES  AN  INPUT  IN  THE  F0LL0W1N0  FORM 
M-OROER  OF  THE  NUM 
N -ORDER  OF  fll£  DEN 
Ad  > -DENOMINATOR  C0EFIC1ENTS 
BID-NUMERATOR  COEFICIENTS 
AO-NUMERATOR  GAIN 

THE  INPUT  FILE  FOR  POLES  AND  ZEROS  IS  SET  UP  AS  FOLLOWS 
FIRST  ENTRY-NUMBER  OF  ZEROS! INTEGER ) 

SECOND  ENTRY-NUMBER  OF  POLES! INTEGER) 

ZEROS  IN  PA1RS-F1RST  THE  RFAL  PART 

-SECOND  THE  I MAC  PART 
POLES  IN  PAIRS-FIRST  THE  REAL  PART 

-SECOND  THE  1MAG  PART 

GAIN 

APR-ARRAY  OF  REAL  ROOTS 
API— ARRAY  OF  INTEGER  ROOTS 
GAIN-OVERALL  GAIN 

NPOLE  OR  MZER -NUMBER  OF  POLES  OR  ZEROS  RESPECTIVELY 
C************************************************'*'***#**'**'****** 

C  PROGRAM  FORMANT 

DIMENSION  0UTFILE<7>. INFIL£!7>.  TI2S6). AIO:  SO).  BIO  SO  >  . APR  I  DO ) , 

X  API! SO). BPI1S0). W!0:  SO ) • BPR ISO). POLY i S 1 > . 0UT2 ( 7 > . YI256) 

INTEGER  MNK!2S6).YES 

TYPE "THE  MAX  COEFICIENTS  ALLOWED  ARE  SO  EACH" 

GAIN  -  1.  0 
NPOS  «  O 
S  -  1.  0 
NFILES  m  o 
C 

CM*  ZERO  INTERMEDIATE  FILTER  POINTS 
C 

DO  IS  K«0. SO 
U!K)  -  O.  O 
IS  CONTINUE 

HI  •  3.  1416 

ACCEPT  "FILE  NAME  IOUTPUT)?  " 

READ! 1 1 .  3910UTFILE! 1 ) 

39  FORMAT! SI 3) 

CALL  DFILWiOUTFILE. IEH) 

IF1IER.  EG.  13)  GO  TO  10 
IF  HER.  NE.  1)  TYPE  "DELETE  ERROR  ".  IER 
C 

C**»  INSERT  NO.  OF  BLOCKS 
C 

10  ACCEPT "HOW  MANY  BLOCKS?  ISO  OR  LESS)  IBLCKS 

CALL  CFILUIOUTFILE. 3. IBLCKS.  IER) 
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IF  (IER.NE.  1)  TYPE  "CREATE  FILE  ERROR  1ER 
CALL  QPEN< 1.  QUTFILE.  3.  IER) 

IF  (IER  .  NE.  1)  TYPE  “OPEN  ERROR  ".IER 
N  »  296  *  IBLCKS 
C 

Cm*  INPUT  INFORMATION! IMPULSE  FREQUENCY  INPUT) 

C 

ACCEPT "WHAT  IS  THE  IMPULSES  INPUT  FREQUENCY?  "» I FREQ 1 
NPER  -  INT <8000.  O/FLOAT  < IFREQ1 ) > 

C 

Cm*  SEE  IF  INPUT  IS  IN  A  FILE 
C 

ACCEPT"ARE  FILTER  C0EF1CIENTS  IN  A  FILE? < 1 -YES.  0-N0)  “.YES 
IF  (YES  .EG.  0)CO  TO  30 
ACCEPT"FILk  NAME (INPUT)?  " 

READ< 11.  39) INFILE ( 1 ) 

CALL  OPEN (3. INFILE. 1. IER) 

IF( IER  . NE.  1 )TYPE"OPEN  ERROR  ON  INFILE  ".IER 
ACCEPT" INPUT  POLES  AND  ZEROS ( 1-YES.  O-NO)?  ".YES 
IF(YES  . EQ.  1)00  TO  20 
C 

C***  INPUT  DIRECT  FORM 
C 

READ  FREE (3)  MNUM 
READ  FREE(3)NDEN 
MNUM  -  MNUM 
NDEN  -  NDEN 

READ  FREEOHBU). 1*0. MNUM) 

READ  FREE(3)(A(I).  I-O.NDEN) 

READ  FREE (3) AO 
00  TO  98 
C 

C*** INPUT  POLES 
C 

20  READ  FREE(3)  MZER 

READ  FREE(3)  NPOLE 

IF  (MZER  .EQ.  0)00  TO  21 

DO  21  1*1.  MZER 

READ  FREEO)  BPR(I) 

READ  FREEO)  BPI(I) 

21  CONTINUE 

DO  22  1*1. NPOLE 

READ  FREEO)  APR<!> 

READ  FREEO)  API  (I) 

22  CONTINUE 

READ  FREEO)  AO 

MNUM  ■  MZER  +1 
NDEN  *  NPOLE  +  1 
OAIN  -  1.  0 

IF  (MZER  .EQ.  0)B(0)«1.0 
C 

C  FIND  THE  POLE  POLYNOMIAL 
C 

CALL  EXPAND1APR. API.  OAIN. NPOLE.  POLY) 

AO  *  AO/POLY (NDEN) 

DO  29  1V*1<  NDEN 

POLY ( IV )  -  POLY( IV) /POLY (NDEN) 

29  CONTINUE 

DO  24  IV»1>  NDEN 

NVER  •  NPOLE  ♦  2  -  IV 


£ 
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A(IV)  -  POLY ( NVER ) 


24  CONTINUE 

WAIN  *1.0 

CAUL  EXPAND!  DPR.  BP  1.  GAIN.  rlZER.  POLY) 

DO  26  IV-l.MNUH 

NVER  -  MNUM  ♦  2  -  IV 
B( IV)  -  POLY (NVER ) 

26  CONTINUE 

ACCEPT -SAVE  THE  RESULTS  IN  DIR.  FORM  (1-Y.O-N)?  ".YES 
IF (YES  .  EG.  0)00  TO  38 
ACCEPT "FILENAME? (OUTPUT)  " 

READ! II.  39>0UT2( 1 ) 

CALL  DF1LM(0UT2.  IER) 

IF( IER  .  EO.  13)00  TO  27 

IF( IER  .  NE.  1 ) TYPE" DELETE  ERROR  ".IER 

27  CONTINUE 

CALL  CF1LW(0UT2. 2.  IER) 

IF ( IER  .  NE.  1 )TYPE"CREATE  FILE  ERROR  0UT2  ".IER 
CALL  0PEN(2. 0UT2. 3. IER) 

IF ( IER  NE.  1)TYPE"  OPEN  ERROR  ON  QUT2  ".IER 
URITE(2. 99 ) HNUM 
WRITE ( 2. 99 ) NDEN 
DO  111  JB*0. HNUM 

WRITE (2. 98 ) B ( JB ) 

111  CONTINUE 

DO  112  JB-O. NDEN 

WR I TE ( 2. 9B)A( JB) 

112  CONTINUE 
WRITE (2. 98) AO 

98  FORMAT (IX. F20.  3 ) 

99  FORMAT ( IX.  14) 

CALL  CLOSE (2.  IER) 

IF( IER  NE  1)TYPE"CL0SE  ERROR  ON  0UT2  ".IER 
GO  TO  38 
30  NFILES  *  1 

ACCEPT" WILL  THE  INPUT  BE  POLES  AND  ZEROS! 1— YES.  O-NO)  " 
IF ( YES  .  EQ.  1)00  TO  41 
C 

C*** INPUT  DIRECT  FORM  FROM  CONSOLE 
C 

TYPE  “THE  TRANSFER  FUNCTION  IS  SET  UP  AS  FOLLOWS  " 

TYPE  "  " 

TYPE  "H(Z)-A0*(3(0)-*-B(  1  >*Z**(-1  )+.  .  .  ♦B (M) *Z*#( ~M> ) / " 
TYPE  “  <A(0)«-A<1)*Z**<-1)*.  .  .  «-A(N)*Z**(-N>>  " 

TYPE  " 

ACCEPT "M-?  ". MNUM 
ACCEPT “N- 7  ".NDEN 
C 

C»*«NOTE:  THE  VALUES  ARE  SHIFTED  UP  IN  THE  ARRAY 
C 

DO  33  HFIL*0. MNUM 
HD  -  KF1L 
TYPE"8 ( ". HD.  " >  -  " 

ACCEPT  B(HF1L) 

33  CONTINUE 

DO  40  KFILaO. NDEN 
HD  -  KFIL 

TYPE"A( ",  HD.  " )  -  " 

ACCEPT  A(HFIL) 

40  CONTINUE 
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ACCEPT  “AO?  *  “»  AO 
GO  TO  sa 
C 

C*** INPUT  POLES  AND  ZEROS  FROM  CONSOLE 
C 

41  BIO)  *  1.  O 

ACCEPT “HOW  MANY  POLES?  NPOLE 

ACCEPT “HOW  MANY  ZEROS?  MZER 

DO  42  KFIL-1.  NPOLE 

ACCEPT “REAL  PART  OF  POLE-  APR (KFIL) 

ACCEPT- IMAG  PART  OF  POLE-  “. API (KFIL) 

42  CONTINUE 

IF1MZER  .  EO.  0)00  TO  43 
DO  43  KFIL-1. MZER 

ACCEPT-REAL  PART  OF  ZERO'*  DPR  (KFIL) 

ACCEPT- IMAG  PART  OF  ZERO-  “.DPI (KFIL) 

43  CONTINUE 
ACCEPT-OAIN  -  “.AO 
MNUM  -  MZER 

NDEN  *  NPOLE 
GAIN  -1.0 

IF  (MZER  .EG.  0 )  MNUM— O 
X  TYPE"RUN  EXPAND  POLES" 

CALL  EXP AND (APR. API. GAIN. NPOLE. POLY) 

X  TYPE-RAN  EXPAND  POLES" 

AO  -  AO/POLY ( NOEN+ 1 ) 

DO  44  IV-1.  NDEN-M 

POLY(IV)  -  HOLY (IV) /POL Y < NDEN+ 1 ) 

44  CONTINUE 

DO  43  IV-l.NDEN+1 

NVER  -  NPOLF  2  -  IV 
IVS  -  IV  -  1 
A(1V3)  -  POLY(NVEH) 

43  CONTINUE 

X  DO  47  MN3-0.NDEN 

X  TYPE"HN3  -  ", A(MNS) 

X  47  CONTINUE 

CAIN  *1.0 

IF (MZER  .  EQ.  0)00  TO  46 
X  TYPE-RUN  EXPAND  ZEROS  " 

CALL  EXP AND (BPR.  BP I.  GAIN.  MZER. POLY) 

X  TYPE-RAN  EXPAND  ZEROS" 

IF (MZER  .  EQ.  0 ) CO  TO  46 
DO  46  IV-1. MNUM+1 

NVER  -  MNUM  ♦  2  -  IV 
IV5  -  IV  -  1 
B( 1V3)  -  PQLY(NVEK) 

46  CONTINUE 

IF (MZER  .  NE.  0>G0  TO  48 
8(1)  -  1.0 

48  ACCEPT-SAVE  RESULTS  IN  DIRECT  FORH?( i-Y. O-N)  -.YES 
IF(YES  EQ.  0)00  TO  58 
ACCEPT"FILENAME( OUTPUT)?  “ 

READ! 11. 39>0UT2(1) 

CALL  DFILU(0UT2.  1ER) 

IF< IER  .  EQ.  1P)G0  TO  36 
IF( IER  .  NE.  1 ) TYPE “DELETE  ERROR  ".IER 
36  CONTINUE 

CALL  CFILW(0UT2. 2. IER) 

IF<  IER  .  NE.  DTYPE-CREATE  FILE  ERROR  ".IER 
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CALL  0P£N<2.  0UT2>  3.  IER  ) 

IF » IER  N£  1>TYPE''0FEN  ERROR  IER 
WRITE (2. 99 ) MNUM 
WRITEI2. 99 ) NOEN 
DO  1110  JB«0. MNUM 

WRITE (2. 98 ) B  <  JB  > 

CONTINUE 

00  1120  JB-O.  NOEN 

WR1TEI2. 98)A(JB) 

CONTINUE 
WRITE (2. 98) AO 
CALL  CLOSE <2.  IER) 

IF < IER  .  NE.  1 ) TYPE "CLOSE  ERROR  IEN 
CONTINUE 

DO  73  J-l. IBLCKS 
K  »  J  -  1 
00  63  L  *  1. 236 

IF (NPOS  .  NE.  NPER ) GO  TO  39  i  SETS  AN  IMPULSE  AT  INPUT  FREQ 
S  «  :  0 
NPUS  »  0 
CONTINUE 

npos  »  npos  *  i 

TEMP  •  O.  0 
DO  60  KV*1.NDEN 

TEMP* -A <  KV  >  *W(KV>  4-TEMP 
CONTINUE 
W<0)**T£MP-*-S*A0 
TEMPI  -0.0 
DO  61  HV-O.  MNUM 

TEMP  1  *B  <  KV  )  *W<  HV )  4-TEMP  1 
CONT INUE 

IF (NOEN  LT.  MNUM) CO  TO  62 

MEND  -  NDEN 

CO  TO  63 

MEND  *  MNUM 

DO  64  1KR-1.MEND 

MKD  -  Mb  NO  *  j  ..  IKK 
W(MKD)  »  W(MKJ)  -  1) 

CONTINUE 

W<0)  >0.0 
Y(L>  *  TEMPI 
S  -  O.  0 

MNK(L)  -  INT (TEMPI ) 

CONTINUE 

CALL  WRBLK  <  1.  K,  MNH.  1,  IER  > 

IF  ( IER.  NE.  1 )  TYPE  "WRITE  BLOCK  ERROR  ".IER 
IF  (J.  NE.  1  >  GO  TO  73 
DO  70  L-l.  236 
T(L)-Y<L> 

CONTINUE 

CONTINUE 

TYPE-COMPLETED  OUTPUT INC  THE  IMPULSE  RESPONSE  “ 

CALL  CLOSEU.  IER) 

ACCEPT-00  YOU,  WANT  A  PLOT? < 1-Y.  0-N)  ".NYES 
IFCNYES  .  NE.  1)C0  TO  3600 

ACCEPT-USE  PRINTRONICS  PLOTTER? ( 1— Y»  0-N)  ".NO 
IF(NO  .  EQ.  0)C0  TO  3000 
NF  •  1 
SF  -  1.  0 
NPTS  -  236 
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CALL  PLDTIOtT. NPTS. NP, XO. VO, SF  > 

NP  -  10 

CALL  PLOT 10 (T,  NPTS. NP,  XO. YO. SF > 

CO  TO  9600 
3000  IFSCL  -  0 

MODE  -  O 
NG  -  1 
N  -  296 

CALL  CRPH2(0UTFILE.  NO.  T.  U.  N.  MODE.  YM.  YA.  IFSCL) 
XF1NF1LES  .  EQ.  1)C0  TO  9600 
CALL  CLOSE (3. IER> 

IF< IER  .  NE.  1 ) TYPE "CLOSE  ERROR  IER 
9600  CONTINUE 

STOP 
END 


B-61 


mm* 


onnnfififtfinr,  nn 


SUBROUTINE  EXPAND ( ROOTR, ROOT  I, GAIN, NF. POLY) 


THIS  SUBROUTINE  EXPANDS  ROOTS  INTO  A  POLYNOMIAL 

INPUTS:  ROOTR  -THIS  IS  A  RET  OF  REAL  ROOTS 

ROOT I  -THIS  IS  A  RET  QF  IMAGINARY  ROOTS 
GAIN  -SYSTEM  GAIN 

NF  -IS  A  CHECK  FOR  NO  FILTER  (FOR  NF-O) 
OUTPUT.  POLY  -THIS  IS  THE  OUTPUT  POLYNOMIAL 


DIMENSION  ROOTR ( SO) • ROOT I (00). POLY (31 >. BPOLY( 51 ) . APOLY O) 
CLOSE  -  0.  0000001 
DO  2  i»L. 31 
2  POLY ( 1 >  -  0 

IF(NF  .  NE.  0)G0  TO  4 
POLY ( l )  =  GAIN 
RETURN 
4  1-1 

I F ( ABSi RQO t' I  (  I  )  )  GT. CLOSE) GO  TO  10 
NBP— l 

UPOLY ( 1 )“1 

BPOLY ( 2 ) *-ROOTR ( I ) 

1F(NBP.  EG.  NF)GO  TO  SQO 
l-I+l 
GO  TO  20 
10  NBP-2 

BPOLY ( 1 )-l 

DP0LY(2)— 2*R00TR(  I ) 

BP0LY(3)-R00TR( I )**2*R00T1 (1 >**2 
IF(NBP. EQ.  NF)CO  TO  S80 
1-1*2 

20  IF( ABS(R00TI ( I ) ) .  GT.  CLOSE) GO  TO  30 

NAP- 1 

APOLY ( V ) =1 

APOLY  ( 2 )  l-*-ROUTR  <  I ) 

GO  TO  40 
30  NAP-2 

APOLY ( 1 >«1 

APOLY ( 2 ) - -2*R00TR ( I ) 

APOLY  ( 3)  -ROllTR  ( I )  **2*R00TI  (  I  )  *  *2 
40  CALL  POLYMLT( APOLY. BPOLY. POLY. NAP, NBP, NR.  1.  0.  1.  0) 

N-  NP-t  1 

IF(NP.  GE.  NF)GO  TO  990 
DO  43  J-l.N 

43  BPOLY( J)-POLY(U) 

NDP-NP 
I-I+NAP 
GO  TO  20 

880  DO  881  1-1.3 

881  POLY ( I I-BPOLY ( I > 

N-NBP+1 

990  DO  991  I-l.N 

991  POLY  ( I )  -POLY  ( 1  )<*GAIN 
RETURN 

ENO 


SUBROUTINE  POLYMLT ( A. D.  C. NA, NB. NC. AK, DK. CK ) 


THIS  SUBROUTINE  COMPUTES  A  POLYNOMIAL 


»** •••••••*•*••• 

DIMENSION  A(91).B(31).C(91) 

NC»NA+NB 

IF(NC.  LE.  SO) CO  TO  2 

TYPE" DECREE  OP  RESULT  IS  GREATER  I HAN  30  * 

RETURN 

2  DO  I  1-1. 31 
CII)«0 
NAA«NA+1 
NBB-NB+1 
DO  3  I-l.NAA 
DO  3  J-l.NBB 

3  C  < I+J-l )*C ( I+J-l )>A< I ) *B  <  J ) 

CK»AK*BK 

RETURN 
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User  Manual 

The  user  manual  is  divided  into  five  parts.  The  first 
part  will  contain  the  run  lines  for  each  of  the  vocoder 
parts  including  the  pitch  detector,  speech  analyzer  and 
speech  synthesizer.  The  run  line  will  contain  generic  file 
names  such  as  FILE1 ,  etc. 

The  second,  third  and  fourth  parts  will  contain  all  the 
information  required  to  run  the  pitch  detector,  speech 
analyzer  and  speech  synthesizer  respectively.  The  final 
section  will  describe  how  to  use  the  utility  programs  which 
convert  various  vocoder  information  into  convenient  analysis 
forms  such  as  plots. 

Run  Lines 

The  run  lines  for  the  vocoder  will  be  shown  below  using 
FILE#  as  a  generic  file.  The  user  can  substitute  any  legal 
file  name  (see  Data  General  user  manuals  for  details)  for  a 
FILE#.  Once  a  particular  file  number  has  been  chosen  below 
(for  example  FILE5)  that  file  with  number  will  always  refer 
to  the  same  file.  This  was  done  so  that  it  would  be  clear 
when  a  particular  file  is  used.  Following  the  run  lines  a 
description  of  the  files  will  be  given  so  that  the  user  can 
substitute  his  own  files  in  cases  where  programs  other  than 
those  created  in  this  thesis  are  used.  For  example,  any 


other  pitch  detector  could  be  used  provided  the  pitch  period 
file  was  of  the  same  format.  The  run  line  for  the  AUTO 
pitch  detector  (program  PITCH2 )  is  as  follows: 

PITCH2  FILE1/I  FILE2/0  FILE3/D  FILE4/F 
The  run  line  for  the  SIFT  pitch  detector  is: 

SIFT  FILE1/I  FILE2/0  FILE3/D 
The  run  line  for  the  speech  analyzer  (program  SIFT)  is: 

CODER  FILE1/I  FILE3/D  FILE5/P 
The  run  line  for  the  speech  synthesizer  is: 

SYNTH  FILE3/D  FILE5/P  FILE6/S  FILE2/0 
Any  of  the  four  programs  including  PITCH2 ,  SIFT,  CODER  or 
SYNTH  can  be  run  by  typing  in  their  respective  run  line  on 
the  Data  General's  command  line. 

A  description  of  each  file's  content  and  how  it  is 
formatted  is  given  below: 

FILE1:  This  file  contains  the  input  speech  that  will  be 

processed  by  the  vocoder.  The  speech  is  an  integer  file  in 
a  binary  format  (requiring  either  a  binary  read  or  a  read 
block  to  input  the  data).  The  data  is  in  two's 
complimentary  form  with  an  absolute  maximum  of  32,768. 
FILE1  is  used  as  an  input  to  both  pitch  detectors  and  the 
speech  analyzer. 

FILE2:  This  file  contains  a  silence/speech  indicator,  a 
voiced/unvoiced  indicator  if  speech  and  the  pitched  period 
for  a  voiced  speech  segments.  One  record  contains  three 
integers  which  is  the  pitch  information  for  each  80  speech 
samples.  The  first  integer  in  the  record  is  a  one  if  the 


speech  segment  is  voiced  and  a  zero  if  unvoiced  or  silence. 
The  second  integer  is  a  one  if  the  speech  segment  is  speech 


or  a  zero  for  silence.  The  third  integer  is  the  pitch 
period  and  ranges  in  values  from  0  to  160.  If  the  speech 
segment  is  not  voiced  then  the  pitch  period  will  equal  zero. 
The  Fortran  format  for  each  record  is: 

FORMAT  (3X,3 (110. 3X) ) 

FILE2  can  be  an  output  of  either  pitch  detector  and  an  input 
to  both  the  speech  analyzer  and  synthesizer. 

FILE3 :  This  file  contains  the  initialization  parameters 
that  are  used  by  all  the  vocoder  programs.  The  file  is 
composed  of  50  records  where  the  first  25  are  real  and  the 
second  25  are  integers.  Not  all  of  the  records  are  used, 
but  are  available  for  future  expansion.  The  program  SETUP 
(to  be  described  in  the  utility  section)  is  used  to  update 
this  file.  What  follows  is  a  list  of  the  records  and  what 
initialization  parameters  they  represent: 

(Real  Records) 

1)  The  clipping  level  used  in  program  PITCH2; 

2)  The  autocorrelation  threshold  used  in  program 
PITCH2; 

3)  The  voiced/unvoiced  threshold  used  in  PITCH2 ; 

4)  The  variable  alpha  (the  window  shape)  used 
in  CODER; 


V.r. 
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5)  The  speech  scale  used  in  CODER; 

6)  The  silence  threshold  used  in  SIFT; 

7)  The  unvoiced  to  voiced  gain  used  in  SYNTH; 


8)  The  glottal  pulse  positive  slope  size  used 
in  SYNTH; 

9)  The  glottal  pulse  negative  slope  size  used 
in  SYNTH; 

10)  The  silence  threshold  used  in  PITCH2 ; 

11)  thru  25)  are  not  currently  used. 

(Integer  Records) 

26)  The  number  of  points  in  each  1/3  frame  used 
in  PITCH2 ; 

27)  The  number  of  points  between  each  filter 
update  used  in  CODER  and  SYNTH; 

28)  The  pre/de-emphasis  switch  (1-if  used, 

0-if  not)  used  in  CODER  and  SYNTH; 

29)  The  glottal  pulse  or  impulse  switch  (1-for 
glottal  pulse,  0-for  impulse)  used  in  CODER 
and  SYNTH; 

30)  thru  50)  are  not  currently  used. 

FILE4:  This  is  a  file  of  filter  coefficients  that  are 

the  output  of  a  program  WFILTER  which  is  a  user  program  on 
the  Data  General  System.  These  filter  coefficients  are  used 
in  the  input  low  pass  filter  (LPF)  in  PITCH2 .  The  first 
record  (read  free's  are  used  so  location  on  each  line  of  the 
file  doesn't  matter)  is  the  integer  number  of  zeros  in  the 
LPF  and  is  always  assumed  to  be  zero.  The  second  record  is 
always  the  integer  zero.  The  maximum  number  of  zeros 
allowed  are  120,  but  99  are  usually  used.  The  rest  of  the 
records  in  the  file  are  real  filter  coefficients. 


FILE5:  This  file  contains  the  LPC  predictor 

coefficients  which  are  the  output  of  the  speech  analyzer 
(CODER)  and  the  input  to  the  speech  synthesizer  (SYNTH) . 
The  file  is  a  binary  file  where  the  first  value  is  the 
integer  order  (p)  of  the  predictor  coefficients.  The  rest 
of  the  file  consists  of  sets  of  real  predictor  coefficients 
with  p  predictor  coefficients  per  set. 

FILE6:  This  file  contains  the  output  speech  produced  by 
the  speech  synthesizer  (SYNTH) .  This  is  a  binary  file 
consisting  of  integers. 

Pitch  Detectors 

How  to  run  the  pitch  detectors  has  been  described  in  the 
run  line  section  of  this  thesis.  In  addition,  the  way  the 
software  is  organized  and  algorithm  descriptions  were  given 
in  appendix  A  and  chapter  III  respectively.  This  section 
will  provide  the  load  lines  and  a  short  description  of  each 
subroutines  primary  function. 

The  load  line  for  AUTOC  is: 

RLDR  PITCH2  IOF  LPF  ENERGY  SETMAX  CLPPER  AUTCOR  @FLIB@ 

A  short  description  of  each  subroutine  follows: 

IOF-This  subroutine  inputs  the  file  names  on  the  load 
line  and  makes  them  available  to  the  program  PITCH2 . 

LPF-This  subroutine  low  pass  filters  the  input  speech 
(usually  at  900  Hz) . 

ENERGY-This  subroutine  calculates  the  energy  in  a  frame 
and  compares  it  to  a  silence  threshold.  If  the  energy 
exceeds  the  threshold  then  it  is  considered  speech  otherwise 


it  is  silence 


SETMAX-This  subroutine  finds  the  maximum  value  in  the 
first  and  last  third  of  the  frame.  The  smallest  of  these 
two  values  is  the  clipping  level. 

CLPPER-This  subroutine  does  infinite  peak  and  zero 
clipping.  Any  value  larger  or  smaller  than  the  clipping 
level  or  its  negative ,  respectively  is  set  to  a  -1.0  if 
negative  or  1.0  if  positive.  Any  values  in  between  are  set 
to  0.0. 

AUTCOR-This  subroutine  calculates  autocorrelation  for  a 
frame  then  finds  the  fundamental  peak. 

The  load  line  for  SIFT  is: 

RLDR  SIFT  SI FT A  DIRECT  AUTO  ENER  IOF  @FLIB@ 

A  short  description  of  each  subroutine  follows: 

SIFTA-This  subroutine  performs  inverse  filtering  on  the 
speech. 

DIRECT-This  subroutine  implements  a  direct  form  filter 
and  is  called  by  SI  FT  A  for  low  pass  filtering  the  input 
speech. 

AUTO-This  subroutine  calculates  the  autocorrelation 
function  and  is  called  by  SIFTA. 

ENER-This  subroutine  computes  the  energy  in  a  frame  and 
compares  it  to  a  silence  threshold.  If  the  energy  exceeds 
the  threshold  then  it  is  considered  speech  otherwise  it  is 
silence. 

IOF-This  subroutine  inputs  the  file  names  on  the  load 
line  and  makes  them  available  to  the  program  SIFT. 


How  to  run  the  speech  analyzer  has  been  described  in  the 
run  line  section  of  this  thesis.  The  way  the  software  was 
organized  and  the  algorithms  described  was  given  in  appendix 


A  and  chapter  III  respectively. 

This  section  will  provide  the  load  lines  and  a  short 
description  of  each  subroutines  primary  function. 

The  load  line  for  CODER  is: 

RLDR  CODER  RLPC  INFIL  IOF  @FLIB@ 

A  short  description  of  each  subroutine  follows: 

RLPC-This  subroutine  recursively  calculates  the 

autocorrelation  of  the  input  speech. 

INFIL-This  subroutine  inverts  the  autocorrelation 

function  to  produce  a  set  of  predictor  coefficients. 

IOF-This  subroutine  inputs  the  file  names  on  the  load 
line  and  makes  then  available  to  the  program  CODER. 

Speech  Synthesizer 

How  to  run  the  speech  synthesizer  has  been  described  in 
the  run  section  of  this  thesis.  The  way  the  software  was 
organized  and  the  algoritms  described  was  given  in  appendix 
A  and  chapter  III  respectively.  This  section  will  provide 
the  load  line  and  a  short  description  of  each  subroutines 
primary  function. 

The  load  line  for  SYNTH  is: 

RLDR  SYNTH  VOICED  THROAT  UNVOCD  IOF  @FLIB@ 

A  short  description  of  each  subroutine  follows: 

VOICED-If  a  segment  of  speech  has  been  determined  to  be 


voiced  this  subroutine  produces  a  glottal  pulse  or  an 
impulse. 


THROAT-This  subroutine  uses  as  in  input  the  glottal 
pulse  for  voiced  speech  or  noise  for  unvoiced  speech.  The 
appropriate  input  is  put  through  a  digital  filter  whose 
filter  coefficients  are  the  current  LPC  predictor 
coefficients. 

UNVOCD-If  a  segment  of  speech  has  been  determined  to  be 
unvoiced  this  subroutine  outputs  normally  generated  noise. 

IOF-This  subroutine  inputs  the  file  names  on  the  load 
line  and  makes  them  available  to  the  program  SYNTH. 

Utility  Programs 

The  utility  programs  described  in  the  user  manual  are 
interactive  and  don't  use  load  lines  for  inputing  file 
names.  The  only  requirement  to  run  these  programs  is  to 
type  in  the  program  name,  then  carriage  return.  Some  of  the 
files  described  earlier  {for  example,  FILE2)  will  be  used 
here  because  the  utility  programs  operate  on  various  vocoder 
output  files.  The  five  utility  programs  used  are  SCALE1, 
PLTTER,  SETUP,  KEVAL  and  FORMANT. 

The  first  program,  SCALE1,  uses  the  vocoder  synthesized 
speech  output  and  scales  the  speech  to  the  proper  magnitude 
for  use  with  the  D/A  converter.  The  second  program,  PLTTER, 
either  plots  the  pitch  detector  output  or  a  speech  file. 
The  third  program,  SETUP,  allows  the  user  to  update  a  file 
of  parameters  that  are  the  initialization  values  for  the 


vocoder.  The  fourth  program,  KEVAL,  lets  the  user  create  a 
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file  which  is  an  input  file  for  the  3-D  plotter  program 
PL0T3D.  Note:  PL0T3D  was  not  created  in  this  thesis;  for 
details  on  using  it  see  the  book  of  user  programs.  The 
final  program,  FORMANT,  allows  the  user  to  create  an 
artificial  vowel  using  poles  or  a  set  of  filter 
coefficients. 

SCALE1  requires  one  input  speech  file  which  can  either 
be  FILE1  or  another  speech  file  formatted  in  the  same 
manner.  The  output  is  a  speech  file  that  has  been  scaled  to 
a  maximum  of  2000.0.  One  additional  option  allows  uniformly 
distributed  noise  to  be  added  to  the  speech  file.  The 
speech  file  plus  noise  is  still  scaled  to  a  maximum  of 
2000.0. 

What  follows  is  an  example  of  a  console  session  using 
SCALE1  with  noise  being  added  to  a  speech  file  (the  users 
entries  are  in  parenthesis  and  added  comments  are  in 
quotes) : 

(SCALED 

WARNING:  THE  INPUT  FILE  MUST  BE  AN  INTEGER  FILE  AND  BE 
IN  BLOCKED  FORM. 

DO  YOU  WISH  TO  CONTINUE?  (1-Y,  0-N) 

(1) 

INPUT  FILENAME: 

(FILE1) 

OUTPUT  FILENAME: 

(FILEOUT) 


OUTPUT  FILE  SIZE? 


(88)  "This  is  in  blocks  and  is  the  maximum 
size  allowed  by  AUDIOHIST" 

PERFORM  NOISE  ADDITION?  (1-Y,  0-N) 

(1) 

SIZE  OF  MAX  NOISE?  (REAL) 

(500.0) 

THE  FOLLOWING  NO.  OF  POINTS  WERE  CHECKED  22528 
AND  THE  MAX  VALUE  FOUND  WAS  4800 

If  the  program  had  been  run  without  noise  the  only 
difference  would  be  that  the  question  "SIZE  OF  MAX  NOISE?" 
would  not  appear. 

The  next  program  to  be  described  is  PLTTER.  One  input 
file  is  required  and  the  filename  entered  at  the  console. 
This  file  can  be  either  the  output  file  of  a  pitch  detector 
for  a  pitch  plot  or  a  scaled  speech  file  (maximum  of 
2000.0.)  for  a  speech  plot.  If  a  pitch  plot  is  required  two 
plots  are  output  on  the  PRINTRONIX  printer.  The  top  plot  is 
a  plot  of  the  pitch  period  and  the  bottom  is  a  plot  of 
silence  (a  "1")  or  not  silence  (a  "0").  The  second  possible 
output  is  a  plot  of  the  speech  file.  Each  page  has  ten 
plots  of  speech  except  the  last  page  which  has  the  required 
number  of  plots  to  plot  the  remaining  speech. 

What  follows  is  an  example  of  a  console  session  using 
PLTTER : 

(PLTTER) 

FILE  NAME? 


(FILE2) 


FILE  OUTPUT  FROM  PITCH?  (1-Y,  0-N) 

(1) 

In  the  example  above  a  pitch  plot  would  have  been 
produced.  If  a  0  had  been  entered  instead  of  a  1  a  speech 
plot  would  have  been  produced. 

The  next  program,  SETUP,  requires  one  input  file  and 
either  uses  the  input  file  as  an  output  or  uses  an 
additional  file  for  output.  The  purpose  of  this  program  is 
to  allow  the  user  to  change  the  initial  conditions  of  the 
vocoder  by  changing  the  initial  conditions  file.  The  input 
file  is  a  file  already  containing  initial  conditions.  If  a 
new  file  is  to  be  created  the  program  sets  all  values  to 
zero  and  then  the  user  inputs  initial  conditions  from  the 
console.  A  list  of  the  initial  conditions  can  be  output  to 
the  PRINTRONIX  printer  and/or  the  console.  Finally  the 
initial  conditions  can  overwrite  the  input  file  or  be  put  in 
a  new  file. 

What  follows  is  an  example  of  a  console  session  using 
SETUP: 

(SETUP) 

OUTPUT  AND/OR  INPUT  FILE  CURRENTLY  EXIST?  (1-YES,  O-NO) 

(1)  "All  input  and  output  files  must  exist  before 
using  this  program" 

FILE  NAME? 

(PARAM)  "This  is  the  file  containing  initialization 
data  or  the  new  file  to  be  created  if  starting  from  scratch" 

IF  YOU  CHOOSE  TO  CHANGE  A  VARIABLE  ENTERY  OTHERWISE  DO 


A  CARRIAGE  RETURN 


FILTER  COEFFICIENTS  FILE  NAME? 

(FILE5)  "The  output  file  from  the  speech  analyzer" 

ENTER  NUMBER  OF  POINTS  (MAX=100) 

(50)  "This  is  the  number  of  points  to  each  log 
magnitude  plot" 

"At  this  point  the  output  file  is  created.  What  follows 
is  a  continuation  of  the  other  choice  if  a  3-D  plot  isn't 
wanted. " 

CREATE  3-D  FILE  <1-Y,  0-N) 

(0) 

THE  CURRENT  VALUE  OF  CLPLEV  (in  PITCH2 )  IS:  .6 
CHANGE  VALUE? 

(CR)  "carriage  return" 

CURRENT  AUTOCORRELATION  THRESHOLD  (in  PITCH2 )  IS:  .3 
CHANGE  VALUE? 

(Y) 

INPUT  NEW  VALUE 
(.35) 

THE  CURRENT  VALUE  OF  VOICED/ UN  THRESH 
(in  PITCH2 )  IS:  20.0 

"This  questioning  for  each  record  continues  until  all 
the  records  have  been  typed  out.  For  a  complete  list  see 
the  description  of  FILE3  given  earlier  in  this  appendix. 
What  follows  is  the  rest  of  this  session  after  all  records 
have  been  viewed  or  changed." 

DO  YOU  WANT  TO  HAVE  THE  ARRAY  TYPED  (1-YES,  O-NO) 
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(0)  "If  a  1  would  have  been  chosen  the  values 
would  have  been  typed  on  the  console." 

IS  THE  INPUT  FILE  THE  OUTPUT  FILE 

(1)  "This  will  write  the  updated  values  out  to  the 
input  file.  A  0  response  would  have  had  the 
values  written  out  to  a  new  file.  In  that 
case  the  user  would  enter  a  file  name  for  the 
output  file" 

PRINT  ARRAY  ON  PRINTRONIXS?  (1-Y,  0-N) 

(0)  "A  1  choice  produces  a  plot  on  the 
PRINTRONIX  of  the  data" 

PROGRAM  COMPLETED 

The  next  program,  KEVAL,  has  one  input  file  and  if  the 
option  is  selected  on  output  file.  The  input  file  is  a  file 
of  predictor  coefficients  output  from  the  vocoder.  The 
description  of  this  file’s  format  is  given  the  description 
of  FILE5. 

What  follows  is  an  example  of  a  console  session  using 
KEVAL: 

(KEVAL) 

BASICALLY  THIS  PROGRAM  CAN  PROVIDE  ONE  OF  3  DIFFERENT 
OUTPUTS.  THE  INPUT  FILE  CONTAINS  A  SET  OF  LPC  PREDICTOR 
COEFFICIENTS  AND  THE  OUTPUT  CAN  BE  IN  THE  FOLLOWING  FORMS. 

1)  TEKTRONIX  PLOTS  OF  IMPULSE  RESPONSE,  PHASE  RESPONSE, 
LINEAR  MAG.  RESPONSE  AND  LOG  MAG.  RESPONSE  FOR  EACH  FILTER 
SET; 

2)  AN  OUTPUT  FILE  CAN  BE  CREATED  AS  AN  INPUT  TO  A  3-D 


PLOT  PROGRAM. 


CREATE  3-D  FILE  (1-Y,  0-N) 

(1)  "At  this  point  it  will  be  assumed  the  user 

wants  a  3-D  plot.  The  other  case  will  be 
treated  later" 

HOW  MANY  COEF.  SETS 

(50)  "This  means  50  log  magnitude  plots  will  be 
output" 

FILE  NAME  (OUTPUT) ? 

(FPLOT)  "This  is  the  input  file  for  PLOT3D" 

FILTER  COEFFICIENTS  FILE  NAME? 

(FILE5) 

ENTER  NUMBER  OF  POINTS  (MAX=100) 

(50)  "This  is  how  many  points  will  be  on  each  plot" 
LINEAR  MAGNITUDE  PLOT?  ( 2 -CONNECTED  POINTS,  1-VERTICAL 
LINES,  O-NO) 

(2)  "A  plot  of  connected  points  will  appear  on 

the  TETRONIX  401^  screen.  Choosing  a  1  would 
produce  a  plot  of  vertical  lines  and  0  would 
not  provide  a  plot.  After  the  user  is 
completed  with  the  plot  any  key  may  be  pressed 
to  continue." 

"The  next  plot  is  the  log  magnitude  plot.  Then  a  phase 
plot,  and  finally  the  impulse  response  plot.  For  each  of 
these  plots  the  options  0  thru  2  are  given.  After  viewing 
the  plots  of  interest  the  next  console  question  follows." 

RUN  AGAIN?  (1-YES,  O-NO) 


(0)  "The  program  ends” 

If  a  1  had  been  chosen,  each  of  the  plots  picked  the 
first  time  will  be  shown  for  each  subsequent  set  of  filter 
coefficients.  This  was  done  to  avoid  having  to  answer 
questions  between  plots.  If  the  user  does  not  wish  to 
observe  more  plots  enter  0  after  the  RUN  AGAIN?  question. 

The  final  utility  is  FORMANT  which  is  a  program  that  was 
developed  originally  to  create  a  file  which  is  the  output  of 
an  artificial  vowel  that  is  impulsed  at  a  constant 
frequency.  An  input  file  can  be  used  to  either  enter  the 
vowel's  filter  coefficients  in  direct  form  or  as  poles  and 
zeros.  If  the  input  is  a  file  of  poles  and  zeros  then  these 
are  converted  to  coefficients  in  the  direct  form. 

What  follows  is  a  console  session  using  FORMANT: 

THE  MAX  COEFFICIENTS  ALLOWED  ARE  50  EACH 

FILE  NAME  (OUTPUT)? 

( FVOWEL )  "This  is  the  file  the  artificial  vowel 
will  be  put  in" 

HOW  MANY  BLOCKS?  (30  OR  LESS) 

(20)  "FVOWEL  will  be  20  blocks  long" 

WHAT  IS  THE  IMPULSE  INPUT  FREQUENCY? 

(100)  "Once  every  100  samples  the  filter  will  be 
impulsed" 

ARE  THE  FILTER  COEFFICIENTS  IN  A  FILE  (1-YES,  0-NO) ? 

(1) 

FILE  NAME  (INPUT)? 

(INFILE) 

C-15 
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INPUT  POLES  AND  ZEROS  (1-YES.  O-NO) 


"A  pole  zero  file  is  in  a  record  format.  The  first 
record  is  the  number  of  zeros  (z)  and  the  second  is  the 
number  of  poles  (p)  .  The  next  z  records  are  the  zeros  and 
then  the  next  p  records  contain  the  poles." 

SAVE  RESULTS  IN  DIR.  FORM  (1-Y,  0-N) 

(0)  "If  saved  in  direct  form  the  file  is  configured 
to  be  an  input  to  Kathy  Wards  program  EVAL  in 
the  user's  book" 

COMPLETED  OUTPUTTING  THE  IMPULSE  RESPONSE 
DO  YOU  WANT  A  PLOT?  (1-Y,  0-N) 


USE  PRINTRONIX?  (1-Y,  0-N) 

(1)  "The  impulse  response  will  be  plotted  on  the 
PRINTRONIX.  A  choice  of  0  would  plot  the  impulse  response 
on  the  TEKTRONIX  4010-1" 


When  the  question  asked  ARE  THE  FILTER  COEFFICIENTS  IN  A 


FILE  answering  no  requires  the  user  to  enter  data  from  the 
console.  This  method  is  self  explanatory  since  the  program 
asks  for  the  data  item  by  item. 
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